This analysis is going to look at the count of gestures as a function whether the concomitant word is iconic or not.
There are three main variables:
gesture yes/no, which includes all types
gestures (iconic, deictic, beat, metaphoric)iconic_gesture yes/no, which looks at whether those
videos with a gesture also feature an iconic gesturenon_iconic_gesture, which is gesture
- iconic gesture is constructed in the scriptThe models will not be computed in the markdown due to the long
estimation time. This is why the code chunks will have
eval = FALSE as setting. Instead, the pre-compiled models
are loaded into R from the models folder.
Let’s load packages:
# Load packages:
library(tidyverse) # for data processing and visualization
library(patchwork) # for multi-plot arrays
library(brms) # for bayesian analysis
library(effsize) # for Cohen's d effect size
library(dplyr) # for data manipulation
For reproducibility, report version numbers:
R.Version()$version.string
## [1] "R version 4.4.0 (2024-04-24)"
packageVersion('tidyverse')
## [1] '2.0.0'
packageVersion('patchwork')
## [1] '1.2.0'
packageVersion('brms')
## [1] '2.21.0'
Load data:
df <- read_csv('../data/ALL_videos_coded_18_06_24.csv')
Let’s get rid of everything that doesn’t have a number, which are the ones that have not been coded.
df <- filter(df, !is.na(word_used))
Let’s make the word column lowercase:
df <- mutate(df, word = str_to_lower(word))
Let’s see for how many the word wasn’t there?
df |>
count(word_used) |>
mutate(proportion = n / sum(n))
## # A tibble: 2 × 3
## word_used n proportion
## <dbl> <int> <dbl>
## 1 0 1921 0.336
## 2 1 3804 0.664
Let’s reduce the data frame to only those cases where
word_used is equal to 1.
df <- filter(df, word_used == 1)
Let’s do a count of the duplicates:
df |>
count(not_duplicate) |>
mutate(proportion = n / sum(n))
## # A tibble: 2 × 3
## not_duplicate n proportion
## <dbl> <int> <dbl>
## 1 0 682 0.179
## 2 1 3122 0.821
Let’s get rid of the duplicates:
df <- filter(df, not_duplicate == 1)
Let’s count whether the speaker was visible:
df |>
count(speaker_visible) |>
mutate(proportion = n / sum(n))
## # A tibble: 2 × 3
## speaker_visible n proportion
## <dbl> <int> <dbl>
## 1 0 899 0.288
## 2 1 2223 0.712
Let’s take only those for which the speaker is visible:
df <- filter(df, speaker_visible == 1)
Let’s count whether the hands were visible:
df |>
count(hands_visible) |>
mutate(proportion = n / sum(n))
## # A tibble: 2 × 3
## hands_visible n proportion
## <dbl> <int> <dbl>
## 1 0 601 0.270
## 2 1 1622 0.730
Let’s take only those for which the hands are visible:
df <- filter(df, hands_visible == 1)
Let’s count whether the hands were free:
df |>
count(hands_free) |>
mutate(proportion = n / sum(n))
## # A tibble: 2 × 3
## hands_free n proportion
## <dbl> <int> <dbl>
## 1 0 76 0.0469
## 2 1 1546 0.953
The 0’s here include only those cases for which it was
clearly impossible for the speaker to gesture. So, we’ll gesture.
df <- filter(df, hands_free == 1)
Not 100% sure about this, but for now, let’s make the
iconic_gesture cases that have 0 for the
gesture column into NA, just so we don’t treat
those as non-iconic gestures by accident.
df <- mutate(df,
iconic_gesture = ifelse(gesture == 0, 0, iconic_gesture))
For plotting later, it makes sense to switch the
non_iconic label to a less techy-looking
non-iconic, and also change the order so that the
non-iconic level then comes first:
df <- mutate(df,
type = ifelse(type == 'non_iconic',
'non-iconic word', 'iconic word'),
type = factor(type,
levels = c('non-iconic word', 'iconic word')))
Let’s check the count of words by itself:
# Save the word counts:
word_counts <- df |>
count(type, word, sort = TRUE)
# Show:
word_counts |>
print(n = Inf)
## # A tibble: 45 × 3
## type word n
## <fct> <chr> <int>
## 1 non-iconic word said 303
## 2 iconic word spank 97
## 3 iconic word slushy 67
## 4 non-iconic word realize 58
## 5 non-iconic word inform 57
## 6 non-iconic word wearing 56
## 7 iconic word yucky 56
## 8 non-iconic word knew 54
## 9 non-iconic word filling 51
## 10 non-iconic word other 50
## 11 non-iconic word exact 46
## 12 non-iconic word grateful 43
## 13 non-iconic word prevail 40
## 14 iconic word squish 37
## 15 non-iconic word tamper 36
## 16 iconic word splotch 36
## 17 non-iconic word put 35
## 18 non-iconic word confirmed 34
## 19 non-iconic word rejoin 34
## 20 non-iconic word discern 32
## 21 non-iconic word jealous 30
## 22 iconic word puffy 24
## 23 non-iconic word covet 22
## 24 non-iconic word ordain 21
## 25 iconic word swish 21
## 26 iconic word wring 21
## 27 iconic word saggy 17
## 28 iconic word swoosh 15
## 29 iconic word zap 15
## 30 non-iconic word outwit 14
## 31 iconic word chomp 14
## 32 iconic word crispy 14
## 33 non-iconic word absent 13
## 34 iconic word wheeze 13
## 35 iconic word woof 13
## 36 non-iconic word sullen 11
## 37 iconic word barking 11
## 38 non-iconic word acquaint 8
## 39 iconic word bang 6
## 40 iconic word munch 5
## 41 iconic word plump 5
## 42 iconic word wobbly 5
## 43 non-iconic word barren 3
## 44 iconic word snap 2
## 45 iconic word gooey 1
Create a non_iconic_gesture variable which is
gesture minus iconic_gesture… this variable
then specifically looks at all the cases of gestures that are not
iconic.
# Seed with NA values:
df$non_iconic_gesture <- NA
# Create variable:
df <- mutate(df,
non_iconic_gesture = case_when(gesture == 1 & iconic_gesture == 0 ~ 'gesture (non-iconic)',
gesture == 0 & iconic_gesture == 0 ~ 'no gesture'))
# Double check:
df |>
distinct(gesture, iconic_gesture, non_iconic_gesture) |>
select(gesture, iconic_gesture, non_iconic_gesture)
## # A tibble: 4 × 3
## gesture iconic_gesture non_iconic_gesture
## <dbl> <dbl> <chr>
## 1 1 0 gesture (non-iconic)
## 2 0 0 no gesture
## 3 1 1 <NA>
## 4 1 NA <NA>
That’s correct.
Let’s count the overall average gesture rate:
# Save:
gesture_counts <- df |>
count(gesture) |>
mutate(proportion = n / sum(n))
# Show:
gesture_counts
## # A tibble: 2 × 3
## gesture n proportion
## <dbl> <int> <dbl>
## 1 0 664 0.429
## 2 1 882 0.571
Let’s count iconic gestures:
# Save:
iconic_counts <- df |>
filter(gesture == 1) |>
count(iconic_gesture) |>
mutate(proportion = n / sum(n))
# Show:
iconic_counts
## # A tibble: 3 × 3
## iconic_gesture n proportion
## <dbl> <int> <dbl>
## 1 0 671 0.761
## 2 1 210 0.238
## 3 NA 1 0.00113
Let’s count the overall gestures by word type:
# Save:
all_gesture_by_type <- df |>
count(type, gesture) |>
group_by(type) |>
mutate(proportion = n / sum(n))
# Show:
all_gesture_by_type
## # A tibble: 4 × 4
## # Groups: type [2]
## type gesture n proportion
## <fct> <dbl> <int> <dbl>
## 1 non-iconic word 0 514 0.489
## 2 non-iconic word 1 537 0.511
## 3 iconic word 0 150 0.303
## 4 iconic word 1 345 0.697
Let’s count the iconic gestures by word type, over all gestures:
# Save:
iconic_by_type_over_gestures <- df |>
filter(gesture == 1) |>
count(type, iconic_gesture) |>
group_by(type) |>
mutate(proportion = n / sum(n))
# Show:
iconic_by_type_over_gestures
## # A tibble: 5 × 4
## # Groups: type [2]
## type iconic_gesture n proportion
## <fct> <dbl> <int> <dbl>
## 1 non-iconic word 0 435 0.810
## 2 non-iconic word 1 101 0.188
## 3 non-iconic word NA 1 0.00186
## 4 iconic word 0 236 0.684
## 5 iconic word 1 109 0.316
Let’s do the same again, but this time only for those that are not iconic gestures.
# Save:
other_by_type_over_gestures <- df |>
filter(gesture == 1) |>
count(type, non_iconic_gesture) |>
group_by(type) |>
mutate(proportion = n / sum(n))
# Show:
other_by_type_over_gestures
## # A tibble: 4 × 4
## # Groups: type [2]
## type non_iconic_gesture n proportion
## <fct> <chr> <int> <dbl>
## 1 non-iconic word gesture (non-iconic) 435 0.810
## 2 non-iconic word <NA> 102 0.190
## 3 iconic word gesture (non-iconic) 236 0.684
## 4 iconic word <NA> 109 0.316
Let’s count the iconic gestures by word type, over all eligible tokens:
#Save:
iconic_by_type_over_eligible_tokens <- df |>
filter(gesture == 1) |>
count(type, iconic_gesture) |>
group_by(type) |>
mutate(proportion = n / sum(df$hands_free[df$type == type] == 1))
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `proportion = n/sum(df$hands_free[df$type == type] == 1)`.
## ℹ In group 1: `type = non-iconic word`.
## Caused by warning in `==.default`:
## ! longer object length is not a multiple of shorter object length
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
#Show:
iconic_by_type_over_eligible_tokens
## # A tibble: 5 × 4
## # Groups: type [2]
## type iconic_gesture n proportion
## <fct> <dbl> <int> <dbl>
## 1 non-iconic word 0 435 0.414
## 2 non-iconic word 1 101 0.0961
## 3 non-iconic word NA 1 0.000951
## 4 iconic word 0 236 0.477
## 5 iconic word 1 109 0.220
Let’s do the same again, but this time only for those that are not iconic gestures.
# Save:
other_by_type_over_eligible_tokens <- df |>
filter(gesture == 1) |>
count(type, non_iconic_gesture) |>
group_by(type) |>
mutate(proportion = n / sum(df$hands_free[df$type == type] == 1))
#Show:
other_by_type_over_eligible_tokens
## # A tibble: 4 × 4
## # Groups: type [2]
## type non_iconic_gesture n proportion
## <fct> <chr> <int> <dbl>
## 1 non-iconic word gesture (non-iconic) 435 0.414
## 2 non-iconic word <NA> 102 0.0971
## 3 iconic word gesture (non-iconic) 236 0.477
## 4 iconic word <NA> 109 0.220
Let’s do both of these things on a word by word basis. Gesture counts first:
# Save counts:
all_gesture_by_word <- df |>
count(word, gesture) |>
group_by(word) |>
mutate(proportion = n / sum(n)) |>
filter(gesture == 1) |>
select(-gesture)
# Show:
all_gesture_by_word |>
arrange(desc(proportion)) |>
print(n = Inf)
## # A tibble: 44 × 3
## # Groups: word [44]
## word n proportion
## <chr> <int> <dbl>
## 1 bang 6 1
## 2 snap 2 1
## 3 splotch 34 0.944
## 4 chomp 13 0.929
## 5 squish 33 0.892
## 6 zap 13 0.867
## 7 exact 38 0.826
## 8 wring 17 0.810
## 9 plump 4 0.8
## 10 wobbly 4 0.8
## 11 filling 40 0.784
## 12 discern 25 0.781
## 13 put 27 0.771
## 14 ordain 16 0.762
## 15 slushy 49 0.731
## 16 crispy 10 0.714
## 17 swish 15 0.714
## 18 wearing 40 0.714
## 19 spank 67 0.691
## 20 barren 2 0.667
## 21 swoosh 10 0.667
## 22 other 33 0.66
## 23 tamper 23 0.639
## 24 barking 7 0.636
## 25 covet 14 0.636
## 26 acquaint 5 0.625
## 27 woof 8 0.615
## 28 realize 35 0.603
## 29 munch 3 0.6
## 30 knew 32 0.593
## 31 outwit 8 0.571
## 32 puffy 13 0.542
## 33 rejoin 17 0.5
## 34 confirmed 16 0.471
## 35 jealous 14 0.467
## 36 yucky 26 0.464
## 37 sullen 5 0.455
## 38 wheeze 5 0.385
## 39 said 111 0.366
## 40 saggy 6 0.353
## 41 prevail 14 0.35
## 42 grateful 12 0.279
## 43 inform 9 0.158
## 44 absent 1 0.0769
Proportion of iconic gestures (out of videos with gesture) by word:
# Save counts:
iconic_by_word_over_gestures <- df |>
count(word, iconic_gesture) |>
group_by(word) |>
mutate(proportion = n / sum(n)) |>
filter(iconic_gesture == 1) |>
select(-iconic_gesture, -n)
# Show:
iconic_by_word_over_gestures |>
arrange(desc(proportion)) |>
print(n = Inf)
## # A tibble: 35 × 2
## # Groups: word [35]
## word proportion
## <chr> <dbl>
## 1 bang 0.833
## 2 squish 0.811
## 3 wobbly 0.6
## 4 wring 0.524
## 5 put 0.429
## 6 woof 0.385
## 7 chomp 0.357
## 8 barren 0.333
## 9 zap 0.333
## 10 swish 0.286
## 11 splotch 0.278
## 12 tamper 0.222
## 13 discern 0.219
## 14 exact 0.217
## 15 munch 0.2
## 16 other 0.18
## 17 filling 0.137
## 18 swoosh 0.133
## 19 spank 0.124
## 20 slushy 0.119
## 21 inform 0.105
## 22 jealous 0.1
## 23 covet 0.0909
## 24 puffy 0.0833
## 25 wheeze 0.0769
## 26 knew 0.0741
## 27 crispy 0.0714
## 28 outwit 0.0714
## 29 realize 0.0690
## 30 confirmed 0.0588
## 31 rejoin 0.0588
## 32 wearing 0.0536
## 33 said 0.0495
## 34 grateful 0.0465
## 35 yucky 0.0357
Proportion of other gestures (out of videos with gesture) by word:
# Save counts:
other_by_word_over_gestures <- df |>
filter(gesture == 1) |>
count(word, non_iconic_gesture) |>
group_by(word) |>
mutate(proportion = n / sum(n)) |>
select(-non_iconic_gesture, -n)
# Show:
other_by_word_over_gestures |>
arrange(desc(proportion)) |>
print(n = Inf)
## # A tibble: 79 × 2
## # Groups: word [44]
## word proportion
## <chr> <dbl>
## 1 absent 1
## 2 acquaint 1
## 3 barking 1
## 4 ordain 1
## 5 plump 1
## 6 prevail 1
## 7 saggy 1
## 8 snap 1
## 9 sullen 1
## 10 wearing 0.925
## 11 yucky 0.923
## 12 squish 0.909
## 13 crispy 0.9
## 14 rejoin 0.882
## 15 confirmed 0.875
## 16 knew 0.875
## 17 outwit 0.875
## 18 said 0.865
## 19 covet 0.857
## 20 realize 0.857
## 21 puffy 0.846
## 22 slushy 0.837
## 23 bang 0.833
## 24 grateful 0.833
## 25 filling 0.825
## 26 spank 0.821
## 27 swoosh 0.8
## 28 wheeze 0.8
## 29 jealous 0.786
## 30 wobbly 0.75
## 31 exact 0.737
## 32 other 0.727
## 33 discern 0.72
## 34 splotch 0.706
## 35 inform 0.667
## 36 munch 0.667
## 37 tamper 0.652
## 38 wring 0.647
## 39 woof 0.625
## 40 chomp 0.615
## 41 zap 0.615
## 42 swish 0.6
## 43 put 0.556
## 44 barren 0.5
## 45 barren 0.5
## 46 put 0.444
## 47 swish 0.4
## 48 chomp 0.385
## 49 zap 0.385
## 50 woof 0.375
## 51 wring 0.353
## 52 tamper 0.348
## 53 inform 0.333
## 54 munch 0.333
## 55 splotch 0.294
## 56 discern 0.28
## 57 other 0.273
## 58 exact 0.263
## 59 wobbly 0.25
## 60 jealous 0.214
## 61 swoosh 0.2
## 62 wheeze 0.2
## 63 spank 0.179
## 64 filling 0.175
## 65 bang 0.167
## 66 grateful 0.167
## 67 slushy 0.163
## 68 puffy 0.154
## 69 covet 0.143
## 70 realize 0.143
## 71 said 0.135
## 72 confirmed 0.125
## 73 knew 0.125
## 74 outwit 0.125
## 75 rejoin 0.118
## 76 crispy 0.1
## 77 squish 0.0909
## 78 yucky 0.0769
## 79 wearing 0.075
Proportion of iconic gestures (out of eligible tokens) by word:
# Save counts:
iconic_by_word_over_eligible_tokens <- df |>
filter(iconic_gesture == 1) |>
count(word, iconic_gesture) |>
group_by(word) |>
mutate(proportion = n / sum(df$hands_free[df$word == word] == 1)) |>
select(-iconic_gesture)
# Show:
iconic_by_word_over_eligible_tokens |>
arrange(desc(proportion)) |>
print(n = Inf)
## # A tibble: 35 × 3
## # Groups: word [35]
## word n proportion
## <chr> <int> <dbl>
## 1 bang 5 0.833
## 2 squish 30 0.811
## 3 wobbly 3 0.6
## 4 wring 11 0.524
## 5 put 15 0.429
## 6 woof 5 0.385
## 7 chomp 5 0.357
## 8 barren 1 0.333
## 9 zap 5 0.333
## 10 swish 6 0.286
## 11 splotch 10 0.278
## 12 tamper 8 0.222
## 13 discern 7 0.219
## 14 exact 10 0.217
## 15 munch 1 0.2
## 16 other 9 0.18
## 17 filling 7 0.137
## 18 swoosh 2 0.133
## 19 spank 12 0.124
## 20 slushy 8 0.119
## 21 inform 6 0.105
## 22 jealous 3 0.1
## 23 covet 2 0.0909
## 24 puffy 2 0.0833
## 25 wheeze 1 0.0769
## 26 knew 4 0.0741
## 27 crispy 1 0.0714
## 28 outwit 1 0.0714
## 29 realize 4 0.0690
## 30 confirmed 2 0.0588
## 31 rejoin 2 0.0588
## 32 wearing 3 0.0536
## 33 said 15 0.0495
## 34 grateful 2 0.0465
## 35 yucky 2 0.0357
Proportion of other/non-iconic gestures (out of eligible tokens) by word:
# Save counts:
other_by_word_over_eligible_tokens <- df |>
filter(gesture == 1) |>
count(word, non_iconic_gesture) |>
group_by(word) |>
mutate(proportion = n / sum(df$hands_free[df$word == word] == 1))|>
filter(!is.na(non_iconic_gesture))|> # Filter out rows with NA in non_iconic_gesture
select(-non_iconic_gesture)
# Show:
other_by_word_over_eligible_tokens |>
arrange(desc(proportion)) |>
print(n = Inf)
## # A tibble: 44 × 3
## # Groups: word [44]
## word n proportion
## <chr> <int> <dbl>
## 1 snap 2 1
## 2 plump 4 0.8
## 3 ordain 16 0.762
## 4 splotch 24 0.667
## 5 wearing 37 0.661
## 6 filling 33 0.647
## 7 crispy 9 0.643
## 8 barking 7 0.636
## 9 acquaint 5 0.625
## 10 slushy 41 0.612
## 11 exact 28 0.609
## 12 chomp 8 0.571
## 13 spank 55 0.567
## 14 discern 18 0.562
## 15 covet 12 0.545
## 16 swoosh 8 0.533
## 17 zap 8 0.533
## 18 knew 28 0.519
## 19 realize 30 0.517
## 20 outwit 7 0.5
## 21 other 24 0.48
## 22 puffy 11 0.458
## 23 sullen 5 0.455
## 24 rejoin 15 0.441
## 25 swish 9 0.429
## 26 yucky 24 0.429
## 27 tamper 15 0.417
## 28 confirmed 14 0.412
## 29 munch 2 0.4
## 30 jealous 11 0.367
## 31 saggy 6 0.353
## 32 prevail 14 0.35
## 33 put 12 0.343
## 34 barren 1 0.333
## 35 said 96 0.317
## 36 wheeze 4 0.308
## 37 wring 6 0.286
## 38 grateful 10 0.233
## 39 woof 3 0.231
## 40 wobbly 1 0.2
## 41 bang 1 0.167
## 42 squish 3 0.0811
## 43 absent 1 0.0769
## 44 inform 3 0.0526
Let’s merge the counts and the proportions into a big table showing everything on a by-word basis:
# Save:
by_word_all <- word_counts |>
rename(no_of_eligible_tokens = n) |>
left_join(all_gesture_by_word, by = "word") |>
rename(gesture_rate = proportion) |>
rename(no_of_gesture = n) |>
left_join(iconic_by_word_over_eligible_tokens, by = "word") |>
rename(iconic_gesture_rate = proportion) |>
rename(no_of_iconic = n) |>
left_join(other_by_word_over_eligible_tokens, by = "word") |>
rename(other_gesture_rate = proportion) |>
rename(no_of_other = n) |>
mutate(type = str_replace(type, '\n', ''),
type = factor(type, levels = c('non-iconic word', 'iconic word')))
# Show:
by_word_all |>
print(n = Inf)
## # A tibble: 45 × 9
## type word no_of_eligible_tokens no_of_gesture gesture_rate no_of_iconic
## <fct> <chr> <int> <int> <dbl> <int>
## 1 non-icon… said 303 111 0.366 15
## 2 iconic w… spank 97 67 0.691 12
## 3 iconic w… slus… 67 49 0.731 8
## 4 non-icon… real… 58 35 0.603 4
## 5 non-icon… info… 57 9 0.158 6
## 6 non-icon… wear… 56 40 0.714 3
## 7 iconic w… yucky 56 26 0.464 2
## 8 non-icon… knew 54 32 0.593 4
## 9 non-icon… fill… 51 40 0.784 7
## 10 non-icon… other 50 33 0.66 9
## 11 non-icon… exact 46 38 0.826 10
## 12 non-icon… grat… 43 12 0.279 2
## 13 non-icon… prev… 40 14 0.35 NA
## 14 iconic w… squi… 37 33 0.892 30
## 15 non-icon… tamp… 36 23 0.639 8
## 16 iconic w… splo… 36 34 0.944 10
## 17 non-icon… put 35 27 0.771 15
## 18 non-icon… conf… 34 16 0.471 2
## 19 non-icon… rejo… 34 17 0.5 2
## 20 non-icon… disc… 32 25 0.781 7
## 21 non-icon… jeal… 30 14 0.467 3
## 22 iconic w… puffy 24 13 0.542 2
## 23 non-icon… covet 22 14 0.636 2
## 24 non-icon… orda… 21 16 0.762 NA
## 25 iconic w… swish 21 15 0.714 6
## 26 iconic w… wring 21 17 0.810 11
## 27 iconic w… saggy 17 6 0.353 NA
## 28 iconic w… swoo… 15 10 0.667 2
## 29 iconic w… zap 15 13 0.867 5
## 30 non-icon… outw… 14 8 0.571 1
## 31 iconic w… chomp 14 13 0.929 5
## 32 iconic w… cris… 14 10 0.714 1
## 33 non-icon… abse… 13 1 0.0769 NA
## 34 iconic w… whee… 13 5 0.385 1
## 35 iconic w… woof 13 8 0.615 5
## 36 non-icon… sull… 11 5 0.455 NA
## 37 iconic w… bark… 11 7 0.636 NA
## 38 non-icon… acqu… 8 5 0.625 NA
## 39 iconic w… bang 6 6 1 5
## 40 iconic w… munch 5 3 0.6 1
## 41 iconic w… plump 5 4 0.8 NA
## 42 iconic w… wobb… 5 4 0.8 3
## 43 non-icon… barr… 3 2 0.667 1
## 44 iconic w… snap 2 2 1 NA
## 45 iconic w… gooey 1 NA NA NA
## # ℹ 3 more variables: iconic_gesture_rate <dbl>, no_of_other <int>,
## # other_gesture_rate <dbl>
The NA’s in this table are true zeros, so they should be
replaced with 0 proportion.
by_word_all <- mutate(by_word_all,
no_of_gesture = ifelse(is.na(no_of_gesture),
0, no_of_gesture),
gesture_rate = ifelse(is.na(gesture_rate),
0, gesture_rate),
no_of_iconic = ifelse(is.na(no_of_iconic),
0, no_of_iconic),
iconic_gesture_rate = ifelse(is.na(iconic_gesture_rate),
0, iconic_gesture_rate),
no_of_other = ifelse(is.na(no_of_other),
0, no_of_other),
other_gesture_rate = ifelse(is.na(other_gesture_rate),
0, other_gesture_rate))
# Show again:
by_word_all |>
print(n = Inf)
## # A tibble: 45 × 9
## type word no_of_eligible_tokens no_of_gesture gesture_rate no_of_iconic
## <fct> <chr> <int> <dbl> <dbl> <dbl>
## 1 non-icon… said 303 111 0.366 15
## 2 iconic w… spank 97 67 0.691 12
## 3 iconic w… slus… 67 49 0.731 8
## 4 non-icon… real… 58 35 0.603 4
## 5 non-icon… info… 57 9 0.158 6
## 6 non-icon… wear… 56 40 0.714 3
## 7 iconic w… yucky 56 26 0.464 2
## 8 non-icon… knew 54 32 0.593 4
## 9 non-icon… fill… 51 40 0.784 7
## 10 non-icon… other 50 33 0.66 9
## 11 non-icon… exact 46 38 0.826 10
## 12 non-icon… grat… 43 12 0.279 2
## 13 non-icon… prev… 40 14 0.35 0
## 14 iconic w… squi… 37 33 0.892 30
## 15 non-icon… tamp… 36 23 0.639 8
## 16 iconic w… splo… 36 34 0.944 10
## 17 non-icon… put 35 27 0.771 15
## 18 non-icon… conf… 34 16 0.471 2
## 19 non-icon… rejo… 34 17 0.5 2
## 20 non-icon… disc… 32 25 0.781 7
## 21 non-icon… jeal… 30 14 0.467 3
## 22 iconic w… puffy 24 13 0.542 2
## 23 non-icon… covet 22 14 0.636 2
## 24 non-icon… orda… 21 16 0.762 0
## 25 iconic w… swish 21 15 0.714 6
## 26 iconic w… wring 21 17 0.810 11
## 27 iconic w… saggy 17 6 0.353 0
## 28 iconic w… swoo… 15 10 0.667 2
## 29 iconic w… zap 15 13 0.867 5
## 30 non-icon… outw… 14 8 0.571 1
## 31 iconic w… chomp 14 13 0.929 5
## 32 iconic w… cris… 14 10 0.714 1
## 33 non-icon… abse… 13 1 0.0769 0
## 34 iconic w… whee… 13 5 0.385 1
## 35 iconic w… woof 13 8 0.615 5
## 36 non-icon… sull… 11 5 0.455 0
## 37 iconic w… bark… 11 7 0.636 0
## 38 non-icon… acqu… 8 5 0.625 0
## 39 iconic w… bang 6 6 1 5
## 40 iconic w… munch 5 3 0.6 1
## 41 iconic w… plump 5 4 0.8 0
## 42 iconic w… wobb… 5 4 0.8 3
## 43 non-icon… barr… 3 2 0.667 1
## 44 iconic w… snap 2 2 1 0
## 45 iconic w… gooey 1 0 0 0
## # ℹ 3 more variables: iconic_gesture_rate <dbl>, no_of_other <dbl>,
## # other_gesture_rate <dbl>
# Save outside of R:
by_word_all |>
write_csv('../data/by_word_gesture_rates.csv')
Now let’s calculate the average gesture rate by word, then overall:
# Group by the 'type' column and calculate the average gesture_rate for each group
average_gesture_rate_all <- by_word_all %>%
summarise(avg_gesture_rate_all = mean(gesture_rate, na.rm = TRUE))
# View the result
print(average_gesture_rate_all)
## # A tibble: 1 × 1
## avg_gesture_rate_all
## <dbl>
## 1 0.620
Now let’s calculate the average gesture rates by word, then word type (using ‘by_word_all’).
First, overall gesture rate:
# Group by the 'type' column and calculate the average gesture_rate for each group
average_gesture_rate <- by_word_all %>%
group_by(type) %>%
summarise(avg_gesture_rate = mean(gesture_rate, na.rm = TRUE))
# View the result
print(average_gesture_rate)
## # A tibble: 2 × 2
## type avg_gesture_rate
## <fct> <dbl>
## 1 non-iconic word 0.555
## 2 iconic word 0.689
Then, iconic gesture rate:
# Group by the 'type' column and calculate the average iconic_gesture_rate for each group
average_iconic_gesture_rate <- by_word_all %>%
group_by(type) %>%
summarise(avg_gesture_rate = mean(iconic_gesture_rate, na.rm = TRUE))
# View the result
print(average_iconic_gesture_rate)
## # A tibble: 2 × 2
## type avg_gesture_rate
## <fct> <dbl>
## 1 non-iconic word 0.109
## 2 iconic word 0.239
And finally, other gesture rate:
# Group by the 'type' column and calculate the average iconic_gesture_rate for each group
average_other_gesture_rate <- by_word_all %>%
group_by(type) %>%
summarise(avg_gesture_rate = mean(other_gesture_rate, na.rm = TRUE))
# View the result
print(average_other_gesture_rate)
## # A tibble: 2 × 2
## type avg_gesture_rate
## <fct> <dbl>
## 1 non-iconic word 0.444
## 2 iconic word 0.450
We have also coded the videos with gesture for the context of the videos. Let’s see what the proportion of gestures come from each context.
First, unscripted interviews:
# Count the number of gestures (gesture == 1)
total_gestures <- sum(df$gesture == 1)
# Count the number of gestures from unscripted interviews
unscripted_interview_count <- sum(df$gesture == 1 & df$`unscripted interview` == 1)
# Print the result
print(unscripted_interview_count)
## [1] 307
# Calculate the proportion
prop_unscripted_interview <- unscripted_interview_count / total_gestures
# Print the result
print(prop_unscripted_interview)
## [1] 0.3480726
Now, presenting to camera:
# Count the number of gestures from presenting to camera
presenting_camera_count <- sum(df$gesture == 1 & df$`presenting camera` == 1)
# Print the result
print(presenting_camera_count)
## [1] 134
# Calculate the proportion
prop_presenting_camera <- presenting_camera_count / total_gestures
# Print the result
print(prop_presenting_camera)
## [1] 0.1519274
Now, presenting in front of a screen:
# Count the number of gestures from presenting in front of screen
presenting_screen_count <- sum(df$gesture == 1 & df$`presenting screen` == 1)
# Print the result
print(presenting_screen_count)
## [1] 131
# Calculate the proportion
prop_presenting_screen <- presenting_screen_count / total_gestures
# Print the result
print(prop_presenting_screen)
## [1] 0.1485261
Now, giving a speech:
# Count the number of gestures from speeches
giving_speech_count <- sum(df$gesture == 1 & df$`giving speech` == 1)
# Print the result
print(giving_speech_count)
## [1] 195
# Calculate the proportion
prop_giving_speech <- giving_speech_count / total_gestures
# Print the result
print(prop_giving_speech)
## [1] 0.2210884
Speaking in court:
# Count the number of gestures from court
speaking_court_count <- sum(df$gesture == 1 & df$`speaking in court` == 1)
# Print the result
print(speaking_court_count)
## [1] 63
# Calculate the proportion
prop_speaking_court <- speaking_court_count / total_gestures
# Print the result
print(prop_speaking_court)
## [1] 0.07142857
Now, semi-scripted:
# Count the number of gestures from semi-scripted contexts
semi_scripted_count <- sum(df$gesture == 1 & df$`semi-scripted` == 1)
# Print the result
print(semi_scripted_count)
## [1] 33
# Calculate the proportion
prop_semi_scripted <- semi_scripted_count / total_gestures
# Print the result
print(prop_semi_scripted)
## [1] 0.03741497
Finally, scripted acting:
# Count the number of gestures from scripted acting
scripted_acting_count <- sum(df$gesture == 1 & df$`scripted acting` == 1)
# Print the result
print(scripted_acting_count)
## [1] 19
# Calculate the proportion
prop_scripted_acting <- scripted_acting_count / total_gestures
# Print the result
print(prop_scripted_acting)
## [1] 0.02154195
Let’s make a bar plot of the gesture counts:
# Basic plot:
gesture_p <- all_gesture_by_type |>
mutate(gesture = ifelse(gesture == 1, 'gesture', 'no gesture')) |>
mutate(gesture = factor(gesture,
levels = c('no gesture', 'gesture'))) |>
ggplot(aes(x = type,
y = proportion,
fill = gesture)) +
geom_col(width = 0.55)
# Axes and labels:
gesture_p <- gesture_p +
scale_fill_manual(values = c('steelblue', 'goldenrod3'),
name = '') +
scale_y_continuous(expand = c(0, 0)) +
xlab(NULL) +
ylab('All gesture proportion')
# Look and feel:
gesture_p <- gesture_p +
theme_classic()
# Show:
gesture_p
Same for iconic gestures only:
# Basic plot:
iconic_p <- iconic_by_type_over_eligible_tokens |>
mutate(iconic_gesture = ifelse(iconic_gesture == 1,
'iconic gesture',
'other gesture')) |>
mutate(iconic_gesture = factor(iconic_gesture,
levels = c('other gesture', 'iconic gesture'))) |>
ggplot(aes(x = type,
y = proportion,
fill = iconic_gesture)) +
geom_col(width = 0.55)
# Axes and labels:
iconic_p <- iconic_p +
scale_fill_manual(values = c('steelblue', 'goldenrod3'),
labels = c('no gesture', 'gesture'),
name = '') +
scale_y_continuous(expand = c(0, 0)) +
xlab(NULL) +
ylab('Iconic gesture proportion')
# Look and feel:
iconic_p <- iconic_p +
theme_classic()
# Show:
iconic_p
Same for other types of gestures only. This will be the rightmost
plot in our multiple plot array. For this reason we change the variable
to the levels yes and no because then those
levels can serve as legend for all three plots (yes gesture vs. no
gesture, yes iconic versus no iconic etc.)
# Basic plot:
other_p <- other_by_type_over_eligible_tokens |>
mutate(non_iconic_gesture = ifelse(is.na(non_iconic_gesture), 'NA', non_iconic_gesture),
non_iconic_gesture = factor(non_iconic_gesture, levels = c('NA', 'gesture (non-iconic)'))) |>
ggplot(aes(x = type, y = proportion, fill = non_iconic_gesture)) +
geom_col(width = 0.55)
# Axes and labels:
other_p <- other_p +
scale_fill_manual(values = c('steelblue', 'goldenrod3'),
labels = c('no gesture', 'gesture'),
name = '') +
scale_y_continuous(expand = c(0, 0)) +
xlab(NULL) +
ylab('Other gesture proportion')
# Look and feel:
other_p <- other_p +
theme_classic()
# Show:
other_p
Put them all into a three-column plot array, using the
patchwork library. But we should then also
# Add titles and switch off legends except for the last one:
gesture_p <- gesture_p +
ggtitle('a) All gestures') +
theme(legend.position = 'none',
plot.caption = element_text(hjust = 0))
iconic_p <- iconic_p +
ggtitle('b) Iconic gestures') +
theme(legend.position = 'none',
plot.caption = element_text(hjust = 0)) +
ylab(NULL)
other_p <- other_p +
ggtitle('c) Other gestures') +
ylab(NULL) +
theme(plot.caption = element_text(hjust = 0))
# Combine:
three_p <- gesture_p + iconic_p + other_p +
plot_layout(ncol = 3)
# Save combined plot outside:
ggsave(plot = three_p, filename = '../figures/barplots.pdf',
width = 7, height = 3.5)
Let’s make a density plot out of this,
gesture_proportion:
# Plot basics:
ges_prop_p <- by_word_all |>
ggplot(aes(x = gesture_rate, fill = type)) +
geom_density(alpha = 0.5)
# Axes and labels:
ges_prop_p <- ges_prop_p +
scale_fill_manual(values = c('steelblue', 'goldenrod3'),
name = '') +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 3),
breaks = seq(0, 3, 0.5)) +
scale_x_continuous(expand = c(0, 0),
limits = c(0, 1),
breaks = seq(0, 1, 0.25)) +
xlab('Gesture proportion of each word') +
ylab('Density')
# Themes:
ges_prop_p <- ges_prop_p +
theme_classic() +
theme(legend.position = 'bottom') +
ggtitle('a) Proportion of all gestures\nacross words')
# Show:
ges_prop_p
Let’s make a density plot out of this,
gesture_proportion:
# Plot basics:
icon_prop_p <- by_word_all |>
ggplot(aes(x = iconic_gesture_rate, fill = type)) +
geom_density(alpha = 0.5)
# Axes and labels:
icon_prop_p <- icon_prop_p +
scale_fill_manual(values = c('steelblue', 'goldenrod3'),
name = '') +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 3),
breaks = seq(0, 3, 0.5)) +
scale_x_continuous(expand = c(0, 0),
limits = c(0, 1),
breaks = seq(0, 1, 0.25)) +
xlab('Iconic gesture proportion of each word') +
ylab('Density')
# Themes:
icon_prop_p <- icon_prop_p +
theme_classic() +
theme(legend.position = 'bottom') +
ggtitle('b) Proportion of iconic gestures\nacross words')
# Show:
icon_prop_p
Proportion of other gestures:
# Plot basics:
other_prop_p <- by_word_all |>
ggplot(aes(x = other_gesture_rate, fill = type)) +
geom_density(alpha = 0.5)
# Axes and labels:
other_prop_p <- other_prop_p +
scale_fill_manual(values = c('steelblue', 'goldenrod3'),
name = '') +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 3),
breaks = seq(0, 3, 0.5)) +
scale_x_continuous(expand = c(0, 0),
limits = c(0, 1),
breaks = seq(0, 1, 0.25)) +
xlab('Other gesture proportion of each word') +
ylab('Density')
# Themes:
other_prop_p <- other_prop_p +
theme_classic() +
theme(legend.position = 'bottom') +
ggtitle('c) Proportion of any other gestures\nacross words')
# Show:
other_prop_p
Put all three density plots into one big plot:
# Add titles and switch off legends except for the last one:
ges_prop_p <- ges_prop_p +
theme(legend.position = 'none',
plot.caption = element_text(hjust = 0))
icon_prop_p <- icon_prop_p +
theme(legend.position = 'none',
plot.caption = element_text(hjust = 0)) +
ylab(NULL)
other_prop_p <- other_prop_p +
ylab(NULL) +
theme(plot.caption = element_text(hjust = 0))
# Combine:
three_prop_p <- ges_prop_p + icon_prop_p + other_prop_p +
plot_layout(ncol = 3)
# Save combined plot outside:
ggsave(plot = three_prop_p, filename = '../figures/density_plots.pdf',
width = 12, height = 3.5)
Let’s see whether we can do a bar plot of all the words for
gesture_rate:
# Plot basics:
word_bars_p <- by_word_all |>
ggplot(aes(x = reorder(word, gesture_rate),
y = gesture_rate, fill = type)) +
geom_col(width = 0.75) +
geom_text(aes(label = no_of_eligible_tokens),
nudge_y = +0.025,
size = 2.7)
# Axes and labels:
word_bars_p <- word_bars_p +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1.1)) +
scale_fill_manual(values = c('steelblue', 'goldenrod3'),
name = '',
labels = c('low iconicity word', 'high iconicity word')) +
ylab('Proportion of tokens with gesture') +
xlab(NULL)
# Look and feel:
word_bars_p <- word_bars_p +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = 'bottom')
# Show:
word_bars_p
# Save:
ggsave(plot = word_bars_p, filename = '../figures/by_word_bars.pdf',
width = 10, height = 4)
Same for iconic_gesture_rate:
# Plot basics:
iconic_bars_p <- by_word_all |>
ggplot(aes(x = reorder(word, iconic_gesture_rate),
y = iconic_gesture_rate, fill = type)) +
geom_col(width = 0.75) +
geom_text(aes(label = no_of_eligible_tokens),
nudge_y = +0.025,
size = 2.7)
# Axes and labels:
iconic_bars_p <- iconic_bars_p +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1.1)) +
scale_fill_manual(values = c('steelblue', 'goldenrod3'),
name = '',
labels = c('low iconicity word', 'high iconicity word')) +
ylab('Proportion of tokens with iconic gesture') +
xlab(NULL)
# Look and feel:
iconic_bars_p <- iconic_bars_p +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = 'bottom')
# Show:
iconic_bars_p
# Save:
ggsave(plot = iconic_bars_p, filename = '../figures/by_word_iconic_bars.pdf',
width = 10, height = 4)
Same for other_gesture_proportion:
# Plot basics:
other_bars_p <- by_word_all |>
ggplot(aes(x = reorder(word, other_gesture_rate),
y = other_gesture_rate, fill = type)) +
geom_col(width = 0.75) +
geom_text(aes(label = no_of_eligible_tokens),
nudge_y = +0.025,
size = 2.7)
# Axes and labels:
other_bars_p <- other_bars_p +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1.1)) +
scale_fill_manual(values = c('steelblue', 'goldenrod3'),
name = '',
labels = c('low iconicity word', 'high iconicity word')) +
ylab('Proportion of tokens with non-iconic gestures') +
xlab(NULL)
# Look and feel:
other_bars_p <- other_bars_p +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = 'bottom')
# Show:
other_bars_p
# Save:
ggsave(plot = other_bars_p, filename = '../figures/by_word_other_bars.pdf',
width = 10, height = 4)
The \n seems to cause issues when extracting stuff from
the model. So we’ll build the model with cleaned level labels for that
predictor and will call that type_cleand. Let’s also make
sure that the reference level is the non-iconic words:
df <- mutate(df,
type_cleaned = if_else(type == "non-iconic word", "non_iconic", "iconic"),
type_cleaned = factor(type_cleaned, levels = c('non_iconic', 'iconic')))
Specify weakly informative priors, specifically for the beta coefficient, using the recommendation by Gelman et al. (2008) to specify a Cauchy prior centered at 0 with scale 2.5.
weak_priors <- c(prior(student_t(3, 0, 2.5), class = Intercept),
prior(student_t(3, 0, 2.5), class = sd),
prior(cauchy(0, 2.5), class = b)) # Gelman et al. (2008)
Let’s fit a model with a fixed effect of type, and
random intercepts for word and url because we
have multiple data points for each of these grouping factors. We cannot
fit type random slopes here because there is no possible
variation whatsoever of type within word or
url since each word or video is always either iconic or not
iconic.
gesture_mdl <- brm(gesture ~
# Fixed effects:
1 + type_cleaned +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = df,
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(gesture_mdl, file = '../models/gesture_mdl.RData')
Load and show model:
# Load:
load('../models/gesture_mdl.Rdata')
# Show:
gesture_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: gesture ~ 1 + type_cleaned + (1 | word) + (1 | url)
## Data: df (Number of observations: 1546)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 837)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.24 0.26 1.77 2.80 1.00 1782 3753
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.31 0.24 0.91 1.86 1.00 3247 4147
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.57 0.33 -0.07 1.24 1.00 4451 5189
## type_cleanediconic 0.92 0.49 -0.03 1.93 1.00 5739 5109
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Show the priors just to double check:
prior_summary(gesture_mdl)
## prior class coef group resp dpar nlpar lb ub
## cauchy(0, 2.5) b
## cauchy(0, 2.5) b type_cleanediconic
## student_t(3, 0, 2.5) Intercept
## student_t(3, 0, 2.5) sd 0
## student_t(3, 0, 2.5) sd url 0
## student_t(3, 0, 2.5) sd Intercept url 0
## student_t(3, 0, 2.5) sd word 0
## student_t(3, 0, 2.5) sd Intercept word 0
## source
## user
## (vectorized)
## user
## user
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
Show the posterior:
gesture_mdl_posts <- posterior_samples(gesture_mdl)
## Warning: Method 'posterior_samples' is deprecated. Please see ?as_draws for
## recommended alternatives.
Make a plot of the posterior distribution of the type
coefficient:
# Plot basics:
gesture_posts_p <- gesture_mdl_posts |>
ggplot(aes(x = b_type_cleanediconic)) +
geom_density(fill = 'purple3') +
geom_vline(xintercept = 0, linetype = 'dashed')
# Axes and labels:
gesture_posts_p <- gesture_posts_p +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1.2),
breaks = seq(0, 1.2, 0.2)) +
scale_x_continuous(limits = c(-3, 3),
breaks = seq(-3, 3, 1))
# Look and feel:
gesture_posts_p <- gesture_posts_p +
ylab('Density of posterior samples') +
xlab('Posterior estimate of coefficient') +
theme_classic()
# Show and save:
gesture_posts_p
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_density()`).
ggsave(plot = gesture_posts_p, filename = '../figures/gesture_posterior.pdf',
width = 4.7, height = 3)
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_density()`).
This shows that given the model formula, prior, and data, most of the
plausible type effects are positive, which means that it is
more plausible that iconic words also co-occur with gesture. Since the
posterior distribution crosses over zero, it is possible that
the effect could be negative, but this is quite improbable given where
the bulk of the posterior distribution lies.
This is to calculate the actual posterior probability of the effect being positive (= of the same sign), which is essentially just pinning a number to what proportion of the area in the distribution above is to the right of the dashed line.
hypothesis(gesture_mdl, 'type_cleanediconic > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (type_cleanediconic) > 0 0.92 0.49 0.12 1.74 32.06
## Post.Prob Star
## 1 0.97 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Same for iconic gestures:
iconic_mdl <- brm(iconic_gesture ~
# Fixed effects:
1 + type_cleaned +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = df,
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(iconic_mdl, file = '../models/iconic_mdl.RData')
Load and show model:
# Load model:
load('../models/iconic_mdl.RData')
# Show model:
iconic_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: iconic_gesture ~ 1 + type_cleaned + (1 | word) + (1 | url)
## Data: df (Number of observations: 1545)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 837)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.23 0.33 1.64 2.92 1.00 2026 3593
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.67 0.32 1.12 2.40 1.00 3090 4290
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -3.95 0.55 -5.10 -2.95 1.00 2927 4096
## type_cleanediconic 1.51 0.63 0.27 2.76 1.00 3921 4715
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Show the priors to double-check:
prior_summary(iconic_mdl)
## prior class coef group resp dpar nlpar lb ub
## cauchy(0, 2.5) b
## cauchy(0, 2.5) b type_cleanediconic
## student_t(3, 0, 2.5) Intercept
## student_t(3, 0, 2.5) sd 0
## student_t(3, 0, 2.5) sd url 0
## student_t(3, 0, 2.5) sd Intercept url 0
## student_t(3, 0, 2.5) sd word 0
## student_t(3, 0, 2.5) sd Intercept word 0
## source
## user
## (vectorized)
## user
## user
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
Show the posterior of the type coefficient. First,
extract the posterior samples:
iconic_mdl_posts <- posterior_samples(iconic_mdl)
## Warning: Method 'posterior_samples' is deprecated. Please see ?as_draws for
## recommended alternatives.
Make a plot of the posterior distribution of the coefficient:
# Plot basics:
iconic_posts_p <- iconic_mdl_posts |>
ggplot(aes(x = b_type_cleanediconic)) +
geom_density(fill = 'purple3') +
geom_vline(xintercept = 0, linetype = 'dashed')
# Axes and labels:
iconic_posts_p <- iconic_posts_p +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1.2),
breaks = seq(0, 1.2, 0.2)) +
scale_x_continuous(limits = c(-3, 3),
breaks = seq(-3, 3, 1))
# Look and feel:
iconic_posts_p <- iconic_posts_p +
ylab('Density of posterior samples') +
xlab('Posterior estimate of coefficient') +
theme_classic()
# Show and save:
iconic_posts_p
## Warning: Removed 88 rows containing non-finite outside the scale range
## (`stat_density()`).
ggsave(plot = iconic_posts_p, filename = '../figures/iconic_gesture_posterior.pdf',
width = 4.7, height = 3)
## Warning: Removed 88 rows containing non-finite outside the scale range
## (`stat_density()`).
Check whether the main effect of iconicity is likely to be of the same sign. This is the posterior probability of iconic words having a higher gesture rate:
hypothesis(iconic_mdl, 'type_cleanediconic > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (type_cleanediconic) > 0 1.51 0.63 0.5 2.56 118.4
## Post.Prob Star
## 1 0.99 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
The final model is the one with only the non-iconic gestures:
Same for non-iconic (= “other”) gestures:
other_mdl <- brm(non_iconic_gesture ~
# Fixed effects:
1 + type_cleaned +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = filter(df, !is.na(non_iconic_gesture)),
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(other_mdl, file = '../models/other_mdl.RData')
Load and show model:
# Load model:
load('../models/other_mdl.RData')
# Show model:
other_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: non_iconic_gesture ~ 1 + type_cleaned + (1 | word) + (1 | url)
## Data: filter(df, !is.na(non_iconic_gesture)) (Number of observations: 1335)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 736)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.32 0.29 1.80 2.93 1.00 1615 3180
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.21 0.24 0.79 1.74 1.00 2953 4807
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.26 0.31 -0.88 0.35 1.00 3801 4763
## type_cleanediconic -0.46 0.48 -1.41 0.47 1.00 5270 5613
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Show the priors to double-check:
prior_summary(other_mdl)
## prior class coef group resp dpar nlpar lb ub
## cauchy(0, 2.5) b
## cauchy(0, 2.5) b type_cleanediconic
## student_t(3, 0, 2.5) Intercept
## student_t(3, 0, 2.5) sd 0
## student_t(3, 0, 2.5) sd url 0
## student_t(3, 0, 2.5) sd Intercept url 0
## student_t(3, 0, 2.5) sd word 0
## student_t(3, 0, 2.5) sd Intercept word 0
## source
## user
## (vectorized)
## user
## user
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
Show the posterior of the type coefficient. First,
extract the posterior samples:
other_mdl_posts <- posterior_samples(other_mdl)
## Warning: Method 'posterior_samples' is deprecated. Please see ?as_draws for
## recommended alternatives.
Make a plot of the posterior distribution of the coefficient:
# Plot basics:
other_posts_p <- other_mdl_posts |>
ggplot(aes(x = b_type_cleanediconic)) +
geom_density(fill = 'purple3') +
geom_vline(xintercept = 0, linetype = 'dashed')
# Axes and labels:
other_posts_p <- other_posts_p +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1.2),
breaks = seq(0, 1.2, 0.2)) +
scale_x_continuous(limits = c(-3, 3),
breaks = seq(-3, 3, 1))
# Look and feel:
other_posts_p <- other_posts_p +
ylab('Density of posterior samples') +
xlab('Posterior estimate of coefficient') +
theme_classic()
# Show and save:
other_posts_p
ggsave(plot = other_posts_p, filename = '../figures/other_gesture_posterior.pdf',
width = 4.7, height = 3)
Check how many of them are of the same sign:
hypothesis(other_mdl, 'type_cleanediconic < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (type_cleanediconic) < 0 -0.46 0.48 -1.26 0.31 4.94
## Post.Prob Star
## 1 0.83
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
To assess whether the models make reasonable assumptions about the underlying data-generating processes, we can simulate new data.
Regular bar plots:
# Any gesture model:
pp_check(gesture_mdl, type = "bars", ndraws = 100)
# Iconic gesture model:
pp_check(iconic_mdl, type = "bars", ndraws = 100)
# Other gesture model:
pp_check(other_mdl, type = "bars", ndraws = 100)
Grouped bar plots:
# Any gesture model:
pp_check(gesture_mdl, type = "bars_grouped", ndraws = 100,
group = 'word')
ggsave('../figures/gesture_mdl_pp_check.pdf',
width = 14, height = 12)
# Iconic gesture model:
pp_check(iconic_mdl, type = "bars_grouped", ndraws = 100,
group = 'word')
ggsave('../figures/iconic_mdl_pp_check.pdf',
width = 14, height = 12)
# Other gesture model:
pp_check(other_mdl, type = "bars_grouped", ndraws = 100,
group = 'word')
ggsave('../figures/other_mdl_pp_check.pdf',
width = 14, height = 12)
ECDF (empirical cumulative distribution function):
# Any gesture model:
pp_check(gesture_mdl, type = "ecdf_overlay", ndraws = 100)
# Iconic gesture model:
pp_check(iconic_mdl, type = "ecdf_overlay", ndraws = 100)
# Other gesture model:
pp_check(other_mdl, type = "ecdf_overlay", ndraws = 100)
This completes this analysis.
As it has been found that frequency influences gesture production rates, let’s incorporate that:
# load, rename, and log-transform
SUBTL <- read_csv('../data/SUBTLEX_US_with_POS.csv') |>
rename(freq = FREQcount) |>
rename(POS = Dom_PoS_SUBTLEX) |>
mutate(log_freq = log10(freq))
Merge this into by_word_all:
by_word_all <- left_join(by_word_all,
select(SUBTL, Word, POS, freq, log_freq),
by = c('word' = 'Word'))
Are iconic words different from non-iconic ones in terms of frequency?
by_word_all |>
mutate(type = as.character(type),
type = if_else(type == 'non-iconic word', 'low iconicity\nword', 'high iconicity\nword'),
type = factor(type, levels = c('low iconicity\nword', 'high iconicity\nword'))) |>
ggplot(aes(x = type, y = log_freq, fill = type)) +
geom_boxplot(width = 0.5) +
scale_fill_manual(values = c('steelblue', 'goldenrod3'),
name = '') +
xlab(NULL) +
ylab('Log10 frequency') +
scale_y_continuous(limits = c(0, 5)) +
theme_classic() +
theme(legend.position = 'none') +
theme(axis.text.x = element_text(size = 10, face = 'bold'),
axis.title.y = element_text(margin = margin(r = 15),
size = 12, face = 'bold'))
ggsave(filename = '../figures/barplot_frequency_difference.pdf',
width = 3.5, height = 4.2)
Looks like a big difference. Check the raw frequency and log frequency descriptive stats difference:
by_word_all |>
group_by(type) |>
summarize(freq_M = mean(freq, na.rm = TRUE),
log_freq_M = mean(log_freq, na.rm = TRUE))
## # A tibble: 2 × 3
## type freq_M log_freq_M
## <fct> <dbl> <dbl>
## 1 non-iconic word 7408. 2.64
## 2 iconic word 160. 1.76
Check how big this difference is in terms of effect size (Cohen’s d):
cohen.d(log_freq ~ type, data = by_word_all)
##
## Cohen's d
##
## d estimate: 0.9058493 (large)
## 95 percent confidence interval:
## lower upper
## 0.2743632 1.5373355
Large effect size.
Quickly check whether frequency is correlated with any of the gesture rates, using Spearman’s rho, because these are proportions (but it doesn’t really matter, and Pearson’s r would be fine as well):
with(by_word_all, cor.test(log_freq, gesture_rate, method = 'spearman'))
## Warning in cor.test.default(log_freq, gesture_rate, method = "spearman"):
## Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: log_freq and gesture_rate
## S = 16678, p-value = 0.5191
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.09865885
with(by_word_all, cor.test(log_freq, iconic_gesture_rate, method = 'spearman'))
## Warning in cor.test.default(log_freq, iconic_gesture_rate, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: log_freq and iconic_gesture_rate
## S = 15267, p-value = 0.97
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.005764155
with(by_word_all, cor.test(log_freq, other_gesture_rate, method = 'spearman'))
## Warning in cor.test.default(log_freq, other_gesture_rate, method = "spearman"):
## Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: log_freq and other_gesture_rate
## S = 17487, p-value = 0.3191
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.1519454
Perhaps Pearson’s r correlations are easier to interpret:
with(by_word_all, cor.test(log_freq, gesture_rate, method = 'pearson'))
##
## Pearson's product-moment correlation
##
## data: log_freq and gesture_rate
## t = -0.52008, df = 43, p-value = 0.6057
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3641455 0.2195670
## sample estimates:
## cor
## -0.07906252
with(by_word_all, cor.test(log_freq, iconic_gesture_rate, method = 'pearson'))
##
## Pearson's product-moment correlation
##
## data: log_freq and iconic_gesture_rate
## t = -0.075748, df = 43, p-value = 0.94
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3040537 0.2829425
## sample estimates:
## cor
## -0.01155072
with(by_word_all, cor.test(log_freq, other_gesture_rate, method = 'pearson'))
##
## Pearson's product-moment correlation
##
## data: log_freq and other_gesture_rate
## t = -0.5177, df = 43, p-value = 0.6073
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3638327 0.2199102
## sample estimates:
## cor
## -0.07870418
Not much of any correlations at all. I’d perhaps report these as these are really easy to understand by many people, making it clear that these are on by-word averages.
Make a plot of this, also incorporating word iconicity
(type). For heuristic purposes we’ll also add a linear
model to see whether we can spot any obvious relations that differ
between iconic and non-iconic words:
by_word_all |>
ggplot(aes(x = log_freq, y = gesture_rate,
color = type, label = word)) +
geom_text() +
xlab('Log frequency') +
ylab('Gesture proportion') +
# geom_smooth(method = 'lm') +
scale_color_manual(values = c('steelblue', 'goldenrod3'),
name = '') +
theme_classic() +
theme(legend.position = 'bottom',
axis.title = element_text(size = 12, face = 'bold'),
axis.title.y = element_text(margin = margin(r = 15)),
axis.title.x = element_text(margin = margin(t = 12)))
# Save:
ggsave('../figures/by_word_correlation_frequency_gesture_proportion.pdf',
width = 6.5, height = 5)
The massive overlap and wide scatter suggests that there’s not much of a relation.
by_word_all |>
ggplot(aes(x = log_freq, y = iconic_gesture_rate,
color = type, label = word)) +
geom_text() +
xlab('Log frequency') +
ylab('Iconic gesture proportion') +
# geom_smooth(method = 'lm') +
scale_color_manual(values = c('steelblue', 'goldenrod3'),
name = '') +
theme_classic() +
theme(legend.position = 'bottom')
# Save:
ggsave('../figures/by_word_correlation_frequency_iconic_gesture_proportion.pdf',
width = 6.5, height = 5)
Very weak positive relationship with more frequent words being more iconically gestured.
What happens if we incorporate frequency into the main analysis? For
this we also need to add it to df:
df <- left_join(df, select(SUBTL, Word, freq, log_freq),
by = c('word' = 'Word'))
The model has problems if type contains the paragraph
break we included above for data viz purposes.
# df <- mutate(df,
# type = as.character(type),
# type = ifelse(type == 'non-iconic \nword',
# 'non_iconic',
# 'iconic'),
# type = factor(type, levels = c('non_iconic', 'iconic')))
Check:
gesture_freq_mdl <- brm(gesture ~
# Fixed effects:
1 + type_cleaned + log_freq +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = df,
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(gesture_freq_mdl, file = '../models/gesture_freq_mdl.RData')
Load:
load('../models/gesture_freq_mdl.RData')
Show model:
gesture_freq_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: gesture ~ 1 + type_cleaned + log_freq + (1 | word) + (1 | url)
## Data: df (Number of observations: 1546)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 837)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.24 0.26 1.79 2.79 1.00 1945 3435
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.34 0.24 0.93 1.88 1.00 3061 5027
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.50 0.75 -0.95 1.97 1.00 3398 4431
## type_cleanediconic 0.94 0.56 -0.11 2.08 1.00 3803 5013
## log_freq 0.03 0.25 -0.46 0.52 1.00 3103 4584
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Check effects:
hypothesis(gesture_freq_mdl, 'type_cleanediconic > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (type_cleanediconic) > 0 0.94 0.56 0.05 1.87 22.6
## Post.Prob Star
## 1 0.96 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_freq_mdl, 'log_freq > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (log_freq) > 0 0.03 0.25 -0.38 0.43 1.22 0.55
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Same for iconic gestures:
iconic_freq_mdl <- brm(iconic_gesture ~
# Fixed effects:
1 + type_cleaned + log_freq +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = df,
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(iconic_freq_mdl, file = '../models/iconic_freq_mdl.RData')
Load:
load('../models/iconic_freq_mdl.RData')
Show model:
iconic_freq_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: iconic_gesture ~ 1 + type_cleaned + log_freq + (1 | word) + (1 | url)
## Data: df (Number of observations: 1545)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 837)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.23 0.34 1.62 2.96 1.00 1940 3523
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.70 0.34 1.15 2.49 1.00 2439 4082
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -4.87 1.13 -7.20 -2.82 1.00 2895 3972
## type_cleanediconic 1.89 0.76 0.46 3.47 1.00 3257 5117
## log_freq 0.32 0.34 -0.32 1.01 1.00 2987 4360
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Check effects:
hypothesis(iconic_freq_mdl, 'type_cleanediconic > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (type_cleanediconic) > 0 1.89 0.76 0.66 3.19 209.53
## Post.Prob Star
## 1 1 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_freq_mdl, 'log_freq > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (log_freq) > 0 0.32 0.34 -0.22 0.88 5.24 0.84
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Next look at other:
other_freq_mdl <- brm(non_iconic_gesture ~
# Fixed effects:
1 + type_cleaned + log_freq +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = filter(df, !is.na(non_iconic_gesture)),
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(other_freq_mdl, file = '../models/other_freq_mdl.RData')
Load:
load('../models/other_freq_mdl.RData')
Show model:
other_freq_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: non_iconic_gesture ~ 1 + type_cleaned + log_freq + (1 | word) + (1 | url)
## Data: filter(df, !is.na(non_iconic_gesture)) (Number of observations: 1335)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 736)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.35 0.29 1.83 2.96 1.00 1644 3065
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.23 0.24 0.82 1.77 1.00 2564 4419
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.43 0.74 -1.87 0.99 1.00 4402 4934
## type_cleanediconic -0.40 0.54 -1.47 0.67 1.00 4611 5034
## log_freq 0.06 0.24 -0.42 0.53 1.00 3849 5051
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Check effects:
hypothesis(other_freq_mdl, 'type_cleanediconic < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (type_cleanediconic) < 0 -0.4 0.54 -1.31 0.48 3.35
## Post.Prob Star
## 1 0.77
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_freq_mdl, 'log_freq > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (log_freq) > 0 0.06 0.24 -0.33 0.45 1.52 0.6
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Show the posterior:
gesture_mdl_posts <- posterior_samples(gesture_freq_mdl)
## Warning: Method 'posterior_samples' is deprecated. Please see ?as_draws for
## recommended alternatives.
Make a plot of the posterior distribution of the type
coefficient:
# Plot basics:
gesture_posts_p <- gesture_mdl_posts |>
ggplot(aes(x = b_type_cleanediconic)) +
geom_density(fill = 'purple3') +
geom_vline(xintercept = 0, linetype = 'dashed')
# Axes and labels:
gesture_posts_p <- gesture_posts_p +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1.2),
breaks = seq(0, 1.2, 0.2)) +
scale_x_continuous(limits = c(-3.5, 3.5),
breaks = seq(-3, 3, 1))
# Look and feel:
gesture_posts_p <- gesture_posts_p +
ylab('Density') +
xlab('Posterior estimate') +
theme_classic()
# Show and save:
gesture_posts_p
ggsave(plot = gesture_posts_p, filename = '../figures/gesture_posterior_with_freq.pdf',
width = 4.7, height = 3)
Show the posterior of the type coefficient. First,
extract the posterior samples:
iconic_mdl_posts <- posterior_samples(iconic_freq_mdl)
## Warning: Method 'posterior_samples' is deprecated. Please see ?as_draws for
## recommended alternatives.
Make a plot of the posterior distribution of the coefficient:
# Plot basics:
iconic_posts_p <- iconic_mdl_posts |>
ggplot(aes(x = b_type_cleanediconic)) +
geom_density(fill = 'purple3') +
geom_vline(xintercept = 0, linetype = 'dashed')
# Axes and labels:
iconic_posts_p <- iconic_posts_p +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1.2),
breaks = seq(0, 1.2, 0.2)) +
scale_x_continuous(limits = c(-3.5, 3.5),
breaks = seq(-3, 3, 1))
# Look and feel:
iconic_posts_p <- iconic_posts_p +
ylab('Density of posterior samples') +
xlab('Posterior estimate') +
theme_classic()
# Show and save:
iconic_posts_p
## Warning: Removed 186 rows containing non-finite outside the scale range
## (`stat_density()`).
ggsave(plot = iconic_posts_p, filename = '../figures/iconic_gesture_posterior_with_freq.pdf',
width = 4.7, height = 3)
## Warning: Removed 186 rows containing non-finite outside the scale range
## (`stat_density()`).
Show the posterior of the type coefficient. First,
extract the posterior samples:
other_mdl_posts <- posterior_samples(other_freq_mdl)
## Warning: Method 'posterior_samples' is deprecated. Please see ?as_draws for
## recommended alternatives.
Make a plot of the posterior distribution of the coefficient:
# Plot basics:
other_posts_p <- other_mdl_posts |>
ggplot(aes(x = b_type_cleanediconic)) +
geom_density(fill = 'purple3') +
geom_vline(xintercept = 0, linetype = 'dashed')
# Axes and labels:
other_posts_p <- other_posts_p +
scale_y_continuous(expand = c(0, 0),
limits = c(0, 1.2),
breaks = seq(0, 1.2, 0.2)) +
scale_x_continuous(limits = c(-3.5, 3.5),
breaks = seq(-3, 3, 1))
# Look and feel:
other_posts_p <- other_posts_p +
ylab('Density of posterior samples') +
xlab('Posterior estimate') +
theme_classic()
# Show and save:
other_posts_p
ggsave(plot = other_posts_p, filename = '../figures/other_gesture_posterior_with_freq.pdf',
width = 4.7, height = 3)
Put them all together into one plot:
gesture_posts_p <- gesture_posts_p +
ggtitle('a) Any gestures') +
theme(plot.title = element_text(face = 'bold', size = 12))
iconic_posts_p <- iconic_posts_p + ylab(NULL) +
ggtitle('b) Iconic gestures') +
theme(plot.title = element_text(face = 'bold', size = 12))
other_posts_p <- other_posts_p + ylab(NULL) +
ggtitle('c) Non-iconic gestures') +
theme(plot.title = element_text(face = 'bold', size = 12))
three_p <- gesture_posts_p + plot_spacer() + iconic_posts_p + plot_spacer() + other_posts_p +
plot_layout(widths = c(4, 0.1, 4, 0.1, 4))
ggsave(plot = three_p,
filename = '../figures/all_posteriors_with_frequency.pdf',
width = 9.5, height = 2.5)
## Warning: Removed 186 rows containing non-finite outside the scale range
## (`stat_density()`).
First, count how many unique adjectives and verbs there are:
distinct(df, word, POS) |>
count(POS)
## # A tibble: 2 × 2
## POS n
## <chr> <int>
## 1 adjective 15
## 2 verb 30
Check the average proportion of all gestures, separately for high/low iconicity and POS:
#Save:
all_gesture_by_type_POS <- df |>
count(POS, type, gesture) |>
group_by(POS, type) |>
mutate(proportion = n / sum(n),
percentage = str_c(round(proportion, 3) * 100, '%'))
#Show:
all_gesture_by_type_POS
## # A tibble: 8 × 6
## # Groups: POS, type [4]
## POS type gesture n proportion percentage
## <chr> <fct> <dbl> <int> <dbl> <chr>
## 1 adjective non-iconic word 0 91 0.464 46.4%
## 2 adjective non-iconic word 1 105 0.536 53.6%
## 3 adjective iconic word 0 77 0.407 40.7%
## 4 adjective iconic word 1 112 0.593 59.3%
## 5 verb non-iconic word 0 423 0.495 49.5%
## 6 verb non-iconic word 1 432 0.505 50.5%
## 7 verb iconic word 0 73 0.239 23.9%
## 8 verb iconic word 1 233 0.761 76.1%
Next, iconic gestures (over all gestures):
#Save:
iconic_gesture_by_type_POS_over_gestures <- df |>
count(POS, type, iconic_gesture) |>
filter(!is.na(iconic_gesture)) |>
group_by(POS, type) |>
mutate(proportion = n / sum(n),
percentage = str_c(round(proportion, 3) * 100, '%'))
#Show:
iconic_gesture_by_type_POS_over_gestures
## # A tibble: 8 × 6
## # Groups: POS, type [4]
## POS type iconic_gesture n proportion percentage
## <chr> <fct> <dbl> <int> <dbl> <chr>
## 1 adjective non-iconic word 0 171 0.872 87.2%
## 2 adjective non-iconic word 1 25 0.128 12.8%
## 3 adjective iconic word 0 173 0.915 91.5%
## 4 adjective iconic word 1 16 0.0847 8.5%
## 5 verb non-iconic word 0 778 0.911 91.1%
## 6 verb non-iconic word 1 76 0.0890 8.9%
## 7 verb iconic word 0 213 0.696 69.6%
## 8 verb iconic word 1 93 0.304 30.4%
Next, all other gestures (over all gestures):
#Save:
other_gesture_by_type_POS_over_gestures <- df |>
count(POS, type, non_iconic_gesture) |>
filter(!is.na(non_iconic_gesture)) |>
group_by(POS, type) |>
mutate(proportion = n / sum(n),
percentage = str_c(round(proportion, 3) * 100, '%'))
#Show:
other_gesture_by_type_POS_over_gestures
## # A tibble: 8 × 6
## # Groups: POS, type [4]
## POS type non_iconic_gesture n proportion percentage
## <chr> <fct> <chr> <int> <dbl> <chr>
## 1 adjective non-iconic word gesture (non-iconic) 80 0.468 46.8%
## 2 adjective non-iconic word no gesture 91 0.532 53.2%
## 3 adjective iconic word gesture (non-iconic) 96 0.555 55.5%
## 4 adjective iconic word no gesture 77 0.445 44.5%
## 5 verb non-iconic word gesture (non-iconic) 355 0.456 45.6%
## 6 verb non-iconic word no gesture 423 0.544 54.4%
## 7 verb iconic word gesture (non-iconic) 140 0.657 65.7%
## 8 verb iconic word no gesture 73 0.343 34.3%
To calculate the gesture rate over eligible tokens, we need to first calculate the number of eligible tokens.
# Save:
eligible_tokens_by_type_POS <- df |>
count(POS, type) |>
group_by(POS, type) |>
summarise(total = sum(n))
## `summarise()` has grouped output by 'POS'. You can override using the `.groups`
## argument.
# Show:
eligible_tokens_by_type_POS
## # A tibble: 4 × 3
## # Groups: POS [2]
## POS type total
## <chr> <fct> <int>
## 1 adjective non-iconic word 196
## 2 adjective iconic word 189
## 3 verb non-iconic word 855
## 4 verb iconic word 306
Now, calculate the number of iconic gestures over eligible tokens:
# Save:
iconic_gesture_by_type_POS_over_eligible_tokens <- df |>
count(POS, type, iconic_gesture) |>
left_join(eligible_tokens_by_type_POS, by = c("POS", "type")) |>
filter(iconic_gesture == 1) |>
group_by(POS, type) |>
mutate(proportion = n / total,
percentage = str_c(round(proportion, 3) * 100, '%'))
# Show:
iconic_gesture_by_type_POS_over_eligible_tokens
## # A tibble: 4 × 7
## # Groups: POS, type [4]
## POS type iconic_gesture n total proportion percentage
## <chr> <fct> <dbl> <int> <int> <dbl> <chr>
## 1 adjective non-iconic word 1 25 196 0.128 12.8%
## 2 adjective iconic word 1 16 189 0.0847 8.5%
## 3 verb non-iconic word 1 76 855 0.0889 8.9%
## 4 verb iconic word 1 93 306 0.304 30.4%
Now, other gestures (over eligible tokens):
# Save:
other_gesture_by_type_POS_over_eligible_tokens <- df |>
count(POS, type, non_iconic_gesture) |>
left_join(eligible_tokens_by_type_POS, by = c("POS", "type")) |>
filter(non_iconic_gesture == 'gesture (non-iconic)') |>
group_by(POS, type) |>
mutate(proportion = n / total,
percentage = str_c(round(proportion, 3) * 100, '%'))
# Show:
other_gesture_by_type_POS_over_eligible_tokens
## # A tibble: 4 × 7
## # Groups: POS, type [4]
## POS type non_iconic_gesture n total proportion percentage
## <chr> <fct> <chr> <int> <int> <dbl> <chr>
## 1 adjective non-iconic word gesture (non-icon… 80 196 0.408 40.8%
## 2 adjective iconic word gesture (non-icon… 96 189 0.508 50.8%
## 3 verb non-iconic word gesture (non-icon… 355 855 0.415 41.5%
## 4 verb iconic word gesture (non-icon… 140 306 0.458 45.8%
Now let’s calculate the average gesture rates by word, then by word type/POS (using ‘by_word_all’).
First, overall gesture rate:
# Group by the 'type' column and calculate the average gesture_rate for each group
average_gesture_rate_POS <- by_word_all %>%
group_by(type, POS) %>%
summarise(avg_gesture_rate = mean(gesture_rate, na.rm = TRUE))
## `summarise()` has grouped output by 'type'. You can override using the
## `.groups` argument.
# View the result
print(average_gesture_rate_POS)
## # A tibble: 4 × 3
## # Groups: type [2]
## type POS avg_gesture_rate
## <fct> <chr> <dbl>
## 1 non-iconic word Adjective 0.490
## 2 non-iconic word Verb 0.583
## 3 iconic word Adjective 0.551
## 4 iconic word Verb 0.768
Then, iconic gesture rate:
# Group by the 'type' column and calculate the average iconic_gesture_rate for each group
average_iconic_gesture_rate_POS <- by_word_all %>%
group_by(type, POS) %>%
summarise(avg_gesture_rate = mean(iconic_gesture_rate, na.rm = TRUE))
## `summarise()` has grouped output by 'type'. You can override using the
## `.groups` argument.
# View the result
print(average_iconic_gesture_rate_POS)
## # A tibble: 4 × 3
## # Groups: type [2]
## type POS avg_gesture_rate
## <fct> <chr> <dbl>
## 1 non-iconic word Adjective 0.125
## 2 non-iconic word Verb 0.102
## 3 iconic word Adjective 0.114
## 4 iconic word Verb 0.310
And finally, other gesture rate:
# Group by the 'type' column and calculate the average iconic_gesture_rate for each group
average_other_gesture_rate_POS <- by_word_all %>%
group_by(type, POS) %>%
summarise(avg_gesture_rate = mean(other_gesture_rate, na.rm = TRUE))
## `summarise()` has grouped output by 'type'. You can override using the
## `.groups` argument.
# View the result
print(average_other_gesture_rate_POS)
## # A tibble: 4 × 3
## # Groups: type [2]
## type POS avg_gesture_rate
## <fct> <chr> <dbl>
## 1 non-iconic word Adjective 0.365
## 2 non-iconic word Verb 0.479
## 3 iconic word Adjective 0.437
## 4 iconic word Verb 0.458
Let’s fit a model with a fixed effect of type, and now
added to this the POS effect, and random intercepts for
word and url because we have multiple data
points for each of these grouping factors. We cannot fit
type random slopes here because there is no possible
variation whatsoever of type within word or
url since each word or video is always either iconic or not
iconic.
This time around, it makes sense to sum-code POS and type (dummy codes of -1, +1) so that we can interpret main effects in the presence of interactions more easily.
# Make into factor:
df <- mutate(df,
type_c = factor(type, levels = c('iconic word', 'non-iconic word')),
POS_c = factor(POS, levels = c('verb', 'adjective')))
# Add sum-coding contrast coding scheme for 2 categories:
contrasts(df$type_c) <- contr.sum(2)
contrasts(df$POS_c) <- contr.sum(2)
# Check:
contrasts(df$type_c)
## [,1]
## iconic word 1
## non-iconic word -1
contrasts(df$POS_c)
## [,1]
## verb 1
## adjective -1
Now, fit the model:
gesture_POS_mdl <- brm(gesture ~
# Fixed effects:
1 + type_c + POS_c + type_c:POS_c +
log_freq +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = df,
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(gesture_POS_mdl, file = '../models/gesture_POS_mdl.RData')
Load and show model:
# Load:
load('../models/gesture_POS_mdl.Rdata')
# Show:
gesture_POS_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: gesture ~ 1 + type_c + POS_c + type_c:POS_c + log_freq + (1 | word) + (1 | url)
## Data: df (Number of observations: 1546)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 837)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.27 0.26 1.80 2.80 1.00 1625 3623
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.32 0.25 0.89 1.87 1.00 2289 4169
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.90 0.64 -0.34 2.17 1.00 3203 4554
## type_c1 0.43 0.31 -0.16 1.06 1.00 3076 4683
## POS_c1 0.44 0.27 -0.09 0.98 1.00 3734 4796
## log_freq 0.00 0.25 -0.51 0.51 1.00 2684 4036
## type_c1:POS_c1 0.27 0.28 -0.26 0.82 1.00 3184 4327
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
This is to calculate the actual posterior probability of the effect being positive (= of the same sign), which is essentially just pinning a number to what proportion of the area in the distribution above is to the right of the dashed line.
hypothesis(gesture_POS_mdl, 'type_c1 > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (type_c1) > 0 0.43 0.31 -0.07 0.96 11.03 0.92
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_POS_mdl, 'POS_c1 > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (POS_c1) > 0 0.44 0.27 -0.01 0.88 18 0.95
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_POS_mdl, 'type_c1:POS_c1 > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (type_c1:POS_c1) > 0 0.27 0.28 -0.18 0.72 5.16
## Post.Prob Star
## 1 0.84
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Same for iconic gestures:
iconic_POS_mdl <- brm(iconic_gesture ~
# Fixed effects:
1 + type_c + POS_c + type_c:POS_c +
log_freq +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = df,
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(iconic_POS_mdl, file = '../models/iconic_POS_mdl.RData')
Load and show model:
# Load model:
load('../models/iconic_POS_mdl.RData')
# Show model:
iconic_POS_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: iconic_gesture ~ 1 + type_c + POS_c + type_c:POS_c + log_freq + (1 | word) + (1 | url)
## Data: df (Number of observations: 1545)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 837)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.30 0.35 1.68 3.04 1.00 1757 3422
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.56 0.33 1.01 2.31 1.00 2678 4026
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -4.04 0.90 -5.97 -2.43 1.00 3308 4248
## type_c1 0.80 0.41 0.03 1.64 1.00 3526 4015
## POS_c1 0.63 0.34 -0.03 1.31 1.00 4211 4390
## log_freq 0.26 0.32 -0.36 0.91 1.00 3333 4309
## type_c1:POS_c1 0.52 0.34 -0.15 1.20 1.00 3736 4642
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Check how many of them are of the same sign:
hypothesis(iconic_POS_mdl, 'type_c1 > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (type_c1) > 0 0.8 0.41 0.15 1.5 46.06 0.98 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_POS_mdl, 'POS_c1 > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (POS_c1) > 0 0.63 0.34 0.07 1.19 30.62 0.97 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_POS_mdl, 'type_c1:POS_c1 > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (type_c1:POS_c1) > 0 0.52 0.34 -0.04 1.09 15.16
## Post.Prob Star
## 1 0.94
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
The final model is the one with only the non-iconic gestures:
Same for non-iconic (= “other”) gestures:
other_POS_mdl <- brm(non_iconic_gesture ~
# Fixed effects:
1 + type_c + POS_c + type_c:POS_c +
log_freq +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = filter(df, !is.na(non_iconic_gesture)),
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(other_POS_mdl, file = '../models/other_POS_mdl.RData')
Load and show model:
# Load model:
load('../models/other_POS_mdl.RData')
# Show model:
other_POS_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: non_iconic_gesture ~ 1 + type_c + POS_c + type_c:POS_c + log_freq + (1 | word) + (1 | url)
## Data: filter(df, !is.na(non_iconic_gesture)) (Number of observations: 1335)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 736)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.37 0.29 1.84 3.00 1.00 1640 3239
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.29 0.26 0.84 1.86 1.00 2517 4594
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.56 0.63 -1.78 0.71 1.00 3730 5032
## type_c1 -0.21 0.31 -0.85 0.40 1.00 3515 4348
## POS_c1 -0.27 0.28 -0.83 0.28 1.00 4470 5249
## log_freq 0.06 0.25 -0.46 0.54 1.00 3152 4561
## type_c1:POS_c1 -0.13 0.28 -0.68 0.41 1.00 4180 4767
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Check how many of them are of the same sign:
hypothesis(other_POS_mdl, 'type_c1 < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (type_c1) < 0 -0.21 0.31 -0.73 0.31 2.92 0.75
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_POS_mdl, 'POS_c1 < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
## 1 (POS_c1) < 0 -0.27 0.28 -0.74 0.18 5.39 0.84
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_POS_mdl, 'type_c1:POS_c1 < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (type_c1:POS_c1) < 0 -0.13 0.28 -0.58 0.32 2.1
## Post.Prob Star
## 1 0.68
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Load Lancaster data:
lanc <- read_csv('../data/Lancaster_sensorimotor_norms_for_39707_words.csv')
# Show:
lanc
## # A tibble: 39,707 × 45
## Word Auditory.mean Gustatory.mean Haptic.mean Interoceptive.mean
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 A 2.21 0 0.429 0
## 2 A CAPPELLA 4.33 0 0.222 0.722
## 3 AARDVARK 1.62 0.562 1.62 0.0625
## 4 ABACK 1.29 0.0588 0.294 1.35
## 5 ABACUS 1.56 0.167 3.72 0.278
## 6 ABANDON 0.941 0.118 0.294 2.12
## 7 ABANDONED 1.44 0 0.0556 1.67
## 8 ABANDONEE 1.94 0 1.38 0.562
## 9 ABANDONER 0.571 0 0.143 1.64
## 10 ABANDONMENT 2.13 0.0667 1.2 2.93
## # ℹ 39,697 more rows
## # ℹ 40 more variables: Olfactory.mean <dbl>, Visual.mean <dbl>,
## # Foot_leg.mean <dbl>, Hand_arm.mean <dbl>, Head.mean <dbl>,
## # Mouth.mean <dbl>, Torso.mean <dbl>, Auditory.SD <dbl>, Gustatory.SD <dbl>,
## # Haptic.SD <dbl>, Interoceptive.SD <dbl>, Olfactory.SD <dbl>,
## # Visual.SD <dbl>, Foot_leg.SD <dbl>, Hand_arm.SD <dbl>, Head.SD <dbl>,
## # Mouth.SD <dbl>, Torso.SD <dbl>, Max_strength.perceptual <dbl>, …
There are only dominant auditory, gustatory, haptic, and visual words for the iconic words, so we’ll extract the perceptual strength measures only for that and for olfaction to complete the five senses.
Make Word lowercase, both in the column name and in the
actual content of the column:
lanc <- lanc |>
rename(word = Word,
dominant_modality = Dominant.perceptual,
dominant_action = Dominant.action,
auditory_strength = Auditory.mean,
visual_strength = Visual.mean,
haptic_strength = Haptic.mean,
gustatory_strength = Gustatory.mean,
olfactory_strength = Olfactory.mean,
hand_arm_mean = Hand_arm.mean,
max_strength = Max_strength.perceptual,
head_mean = Head.mean,
foot_leg_mean = Foot_leg.mean,
mouth_mean = Mouth.mean,
torso_mean = Torso.mean,
max_action = Max_strength.action) |>
mutate(word = str_to_lower(word))
Merge with main data:
df <- left_join(df,
select(lanc, word, dominant_modality, dominant_action,
auditory_strength, visual_strength, haptic_strength,
gustatory_strength, olfactory_strength,
hand_arm_mean, foot_leg_mean, mouth_mean, torso_mean, head_mean,
max_strength, max_action))
## Joining with `by = join_by(word)`
Merge with proportion table:
by_word_all <- left_join(by_word_all,
select(lanc, word, dominant_modality, dominant_action,
auditory_strength, visual_strength, haptic_strength,
gustatory_strength, olfactory_strength,
max_strength, max_action))
## Joining with `by = join_by(word)`
# Save outside of R:
by_word_all |>
write_csv('../data/by_word_proportions_with_modality.csv')
Which words are classified how?
select(by_word_all,
word, type, dominant_modality, dominant_action) |>
print(n = Inf)
## # A tibble: 45 × 4
## word type dominant_modality dominant_action
## <chr> <fct> <chr> <chr>
## 1 said non-iconic word Auditory Mouth
## 2 spank iconic word Haptic Hand_arm
## 3 slushy iconic word Visual Mouth
## 4 realize non-iconic word Interoceptive Head
## 5 inform non-iconic word Auditory Mouth
## 6 wearing non-iconic word Visual Foot_leg
## 7 yucky iconic word Gustatory Mouth
## 8 knew non-iconic word Interoceptive Head
## 9 filling non-iconic word Visual Mouth
## 10 other non-iconic word Visual Head
## 11 exact non-iconic word Visual Head
## 12 grateful non-iconic word Interoceptive Head
## 13 prevail non-iconic word Interoceptive Head
## 14 squish iconic word Haptic Hand_arm
## 15 tamper non-iconic word Visual Hand_arm
## 16 splotch iconic word Visual Head
## 17 put non-iconic word Visual Hand_arm
## 18 confirmed non-iconic word Visual Mouth
## 19 rejoin non-iconic word Visual Head
## 20 discern non-iconic word Visual Head
## 21 jealous non-iconic word Interoceptive Head
## 22 puffy iconic word Visual Head
## 23 covet non-iconic word Interoceptive Head
## 24 ordain non-iconic word Visual Mouth
## 25 swish iconic word Visual Head
## 26 wring iconic word Haptic Hand_arm
## 27 saggy iconic word Visual Head
## 28 swoosh iconic word Auditory Hand_arm
## 29 zap iconic word Haptic Hand_arm
## 30 outwit non-iconic word Visual Head
## 31 chomp iconic word Visual Mouth
## 32 crispy iconic word Haptic Mouth
## 33 absent non-iconic word Visual Head
## 34 wheeze iconic word Auditory Mouth
## 35 woof iconic word Auditory Head
## 36 sullen non-iconic word Visual Head
## 37 barking iconic word Auditory Head
## 38 acquaint non-iconic word Visual Head
## 39 bang iconic word Auditory Head
## 40 munch iconic word Gustatory Mouth
## 41 plump iconic word Visual Head
## 42 wobbly iconic word Visual Foot_leg
## 43 barren non-iconic word Visual Head
## 44 snap iconic word Auditory Hand_arm
## 45 gooey iconic word Haptic Hand_arm
Descriptive averages by dominant modality, also with counts:
# Create count table:
modality_type_counts <- by_word_all |>
count(type, dominant_modality)
# Create descriptive averages and merge with counts:
by_word_all |>
group_by(type, dominant_modality) |>
summarize(M_gesture = mean(gesture_rate),
M_iconic = mean(iconic_gesture_rate),
M_other = mean(other_gesture_rate)) |>
left_join(modality_type_counts)
## `summarise()` has grouped output by 'type'. You can override using the
## `.groups` argument.
## Joining with `by = join_by(type, dominant_modality)`
## # A tibble: 7 × 6
## # Groups: type [2]
## type dominant_modality M_gesture M_iconic M_other n
## <fct> <chr> <dbl> <dbl> <dbl> <int>
## 1 non-iconic word Auditory 0.262 0.0774 0.185 2
## 2 non-iconic word Interoceptive 0.488 0.0634 0.422 6
## 3 non-iconic word Visual 0.620 0.132 0.488 15
## 4 iconic word Auditory 0.717 0.238 0.479 6
## 5 iconic word Gustatory 0.532 0.118 0.414 2
## 6 iconic word Haptic 0.662 0.311 0.352 6
## 7 iconic word Visual 0.727 0.215 0.511 8
Clearly shows that visual words have the highest gesture rate, regardless of whether they are iconic or not, and for iconic words, haptic, visual, and auditory words have higher gesture rates than gustatory ones, although there’s only two of those, so we have to be careful not to over-interpret this as it very much hinges on these specific iconic words chosen. This data sparsity issue is only a concern with the categorical modality labels however, as of course the continuous ratings exist for all words.
Descriptive averages by dominant action:
# Create count table:
action_type_counts <- by_word_all |>
count(type, dominant_action)
# Create descriptive averages and merge with counts:
by_word_all |>
group_by(type, dominant_action) |>
summarize(M_gesture = mean(gesture_rate),
M_iconic = mean(iconic_gesture_rate),
M_other = mean(other_gesture_rate)) |>
left_join(action_type_counts)
## `summarise()` has grouped output by 'type'. You can override using the
## `.groups` argument.
## Joining with `by = join_by(type, dominant_action)`
## # A tibble: 8 × 6
## # Groups: type [2]
## type dominant_action M_gesture M_iconic M_other n
## <fct> <chr> <dbl> <dbl> <dbl> <int>
## 1 non-iconic word Foot_leg 0.714 0.0536 0.661 1
## 2 non-iconic word Hand_arm 0.705 0.325 0.380 2
## 3 non-iconic word Head 0.539 0.0973 0.441 15
## 4 non-iconic word Mouth 0.508 0.0702 0.438 5
## 5 iconic word Foot_leg 0.8 0.6 0.2 1
## 6 iconic word Hand_arm 0.704 0.275 0.429 7
## 7 iconic word Head 0.701 0.233 0.468 8
## 8 iconic word Mouth 0.637 0.143 0.494 6
There’s only two foot/leg words, so we shouldn’t really interpret
that. For iconic words, the action ratings don’t seem to matter as much
as the modality, with Mouth a bit lower.
Plot proportions as a function of auditory_strength and
the other perceptual measures. We’ll use a facet wrap for this and so
for that, it would make sense to have all the strength measures in long
format, with an indicator variable saying what strength it is:
by_word_long <- by_word_all |>
select(type, word, gesture_rate, iconic_gesture_rate, other_gesture_rate,
auditory_strength:olfactory_strength) |>
pivot_longer(cols = auditory_strength:olfactory_strength,
names_to = 'modality',
values_to = 'strength')
# Show a few random rows:
by_word_long |>
sample_n(10)
## # A tibble: 10 × 7
## type word gesture_rate iconic_gesture_rate other_gesture_rate modality
## <fct> <chr> <dbl> <dbl> <dbl> <chr>
## 1 iconic wo… slus… 0.731 0.119 0.612 visual_…
## 2 iconic wo… yucky 0.464 0.0357 0.429 gustato…
## 3 iconic wo… spank 0.691 0.124 0.567 visual_…
## 4 non-iconi… covet 0.636 0.0909 0.545 visual_…
## 5 non-iconi… jeal… 0.467 0.1 0.367 haptic_…
## 6 non-iconi… other 0.66 0.18 0.48 visual_…
## 7 iconic wo… bark… 0.636 0 0.636 auditor…
## 8 non-iconi… tamp… 0.639 0.222 0.417 gustato…
## 9 non-iconi… other 0.66 0.18 0.48 gustato…
## 10 non-iconi… prev… 0.35 0 0.35 gustato…
## # ℹ 1 more variable: strength <dbl>
Make a plot out of this, for overall gesture proportion:
# Plot basics:
modality_p <- by_word_long |>
ggplot(aes(x = strength, y = gesture_rate, col = type)) +
geom_smooth(method = 'lm') +
geom_point()
# Axis and labels:
modality_p <- modality_p +
ylab('Gesture proportion') +
xlab('Perceptual strength rating') +
scale_color_manual(values = c('steelblue', 'goldenrod3')) +
facet_wrap(~modality)
# Look and feel:
modality_p <- modality_p +
theme_classic() +
theme(legend.position = 'bottom')
# Show and save:
modality_p
## `geom_smooth()` using formula = 'y ~ x'
ggsave(plot = modality_p,
filename = '../figures/modality_scatterplot_matrix.pdf',
width = 12, height = 6)
## `geom_smooth()` using formula = 'y ~ x'
Same for iconic gesture proportion:
# Plot basics:
modality_iconicity_p <- by_word_long |>
ggplot(aes(x = strength, y = iconic_gesture_rate,
col = type)) +
geom_smooth(method = 'lm') +
geom_point() +
facet_wrap(~modality)
# Axis and labels:
modality_iconicity_p <- modality_iconicity_p +
ylab('Gesture proportion') +
xlab('Perceptual strength rating') +
scale_color_manual(values = c('steelblue', 'goldenrod3'))
# Look and feel:
modality_iconicity_p <- modality_iconicity_p +
theme_classic() +
theme(legend.position = 'bottom')
# Show and save:
modality_iconicity_p
## `geom_smooth()` using formula = 'y ~ x'
ggsave(plot = modality_iconicity_p,
filename = '../figures/modality_scatterplot_matrix_iconic_gestures.pdf',
width = 12, height = 6)
## `geom_smooth()` using formula = 'y ~ x'
Do the same but a single scatterplot with just the max strength, and then the max action.
Make a plot out of this, for overall gesture proportion:
# Plot basics:
modality_p <- by_word_all |>
ggplot(aes(x = max_strength, y = gesture_rate,
col = type, label = word)) +
geom_smooth(method = 'lm') +
geom_text()
# Axis and labels:
modality_p <- modality_p +
ylab('Gesture proportion') +
xlab('Max perceptual strength rating') +
scale_color_manual(values = c('steelblue', 'goldenrod3'))
# Look and feel:
modality_p <- modality_p +
theme_classic() +
theme(legend.position = 'bottom')
# Show and save:
modality_p
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: label.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
ggsave(plot = modality_p,
filename = '../figures/max_strength_scatterplot.pdf',
width = 6.5, height = 4.5)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: label.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
Not necessarily what one would expect, but pretty much anything is compatible with this data since the confidence regions are so wide.
Same for iconic gesture proportion:
# Plot basics:
modality_p <- by_word_all |>
ggplot(aes(x = max_strength, y = iconic_gesture_rate,
col = type, label = word)) +
geom_smooth(method = 'lm') +
geom_text()
# Axis and labels:
modality_p <- modality_p +
ylab('Iconic gesture proportion') +
xlab('Max perceptual strength rating') +
scale_color_manual(values = c('steelblue', 'goldenrod3'))
# Look and feel:
modality_p <- modality_p +
theme_classic() +
theme(legend.position = 'bottom')
# Show and save:
modality_p
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: label.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
ggsave(plot = modality_p,
filename = '../figures/max_strength_scatterplot_iconic_gestures.pdf',
width = 6.5, height = 4.5)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: label.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
This looks like it’s more the specific affordances of the verb semantics in terms of iconicity than anything in relation to how much sensory content there is overall.
Same with action strength:
Make a plot out of this, for overall gesture proportion:
# Plot basics:
action_p <- by_word_all |>
ggplot(aes(x = max_action, y = gesture_rate,
col = type, label = word)) +
geom_smooth(method = 'lm') +
geom_text()
# Axis and labels:
action_p <- action_p +
ylab('Gesture proportion') +
xlab('Max motor strength rating') +
scale_color_manual(values = c('steelblue', 'goldenrod3'))
# Look and feel:
action_p <- action_p +
theme_classic() +
theme(legend.position = 'bottom')
# Show and save:
action_p
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: label.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
ggsave(plot = action_p,
filename = '../figures/max_action_scatterplot.pdf',
width = 6.5, height = 4.5)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: label.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
Nothing much going on.
Same for iconic gesture proportion:
# Plot basics:
action_p <- by_word_all |>
ggplot(aes(x = max_action, y = iconic_gesture_rate,
col = type, label = word)) +
geom_smooth(method = 'lm') +
geom_text()
# Axis and labels:
action_p <- action_p +
ylab('Iconic gesture proportion') +
xlab('Max action strength rating') +
scale_color_manual(values = c('steelblue', 'goldenrod3'))
# Look and feel:
action_p <- action_p +
theme_classic() +
theme(legend.position = 'bottom')
# Show and save:
action_p
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: label.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
ggsave(plot = action_p,
filename = '../figures/max_action_scatterplot_iconic_gestures.pdf',
width = 6.5, height = 4.5)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: label.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
Model with max perceptual strength, action strength, and iconicity:
gesture_mod_mdl <- brm(gesture ~
# Fixed effects:
1 + type_cleaned + max_strength + max_action +
log_freq +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = df,
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(gesture_mod_mdl, file = '../models/gesture_mod_mdl.RData')
Load and show model:
# Load:
load('../models/gesture_mod_mdl.Rdata')
# Show:
gesture_mod_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: gesture ~ 1 + type_cleaned + max_strength + max_action + log_freq + (1 | word) + (1 | url)
## Data: df (Number of observations: 1546)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 837)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.27 0.26 1.81 2.82 1.00 1661 3410
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.36 0.26 0.93 1.94 1.00 2149 3499
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 1.91 1.47 -1.16 4.72 1.00 2624 4024
## type_cleanediconic 1.32 0.67 0.03 2.68 1.00 2426 3770
## max_strength -0.52 0.44 -1.38 0.35 1.00 2402 3748
## max_action -0.02 0.30 -0.61 0.57 1.00 2457 3607
## log_freq 0.14 0.29 -0.42 0.71 1.00 2284 3221
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
This is to calculate the actual posterior probability of the effect being positive (= of the same sign), which is essentially just pinning a number to what proportion of the area in the distribution above is to the right of the dashed line.
hypothesis(gesture_mod_mdl, 'type_cleanediconic > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (type_cleanediconic) > 0 1.32 0.67 0.23 2.42 44.45
## Post.Prob Star
## 1 0.98 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_mod_mdl, 'max_strength < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (max_strength) < 0 -0.52 0.44 -1.23 0.2 7.63 0.88
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_mod_mdl, 'max_action < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (max_action) < 0 -0.02 0.3 -0.51 0.47 1.13 0.53
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Same for iconic gestures:
iconic_mod_mdl <- brm(iconic_gesture ~
# Fixed effects:
1 + type_cleaned + max_strength + max_action +
log_freq +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = df,
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(iconic_mod_mdl, file = '../models/iconic_mod_mdl.RData')
Load and show model:
# Load model:
load('../models/iconic_mod_mdl.RData')
# Show model:
iconic_mod_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: iconic_gesture ~ 1 + type_cleaned + max_strength + max_action + log_freq + (1 | word) + (1 | url)
## Data: df (Number of observations: 1545)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 837)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.29 0.35 1.67 3.03 1.00 1743 3505
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.80 0.35 1.21 2.59 1.00 2616 4976
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -4.79 2.07 -9.19 -0.86 1.00 3634 4733
## type_cleanediconic 1.97 0.88 0.27 3.75 1.00 3169 4473
## max_strength -0.05 0.57 -1.18 1.08 1.00 3144 4369
## max_action -0.03 0.37 -0.76 0.70 1.00 3310 4280
## log_freq 0.35 0.38 -0.38 1.10 1.00 2984 4390
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Check how many of them are of the same sign:
hypothesis(iconic_mod_mdl, 'type_cleanediconic > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (type_cleanediconic) > 0 1.97 0.88 0.54 3.45 85.96
## Post.Prob Star
## 1 0.99 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_mod_mdl, 'max_strength < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (max_strength) < 0 -0.05 0.57 -0.97 0.91 1.16 0.54
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_mod_mdl, 'max_action < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (max_action) < 0 -0.03 0.37 -0.63 0.58 1.16 0.54
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
The final model is the one with only the non-iconic gestures:
Same for non-iconic (= “other”) gestures:
other_mod_mdl <- brm(non_iconic_gesture ~
# Fixed effects:
1 + type_cleaned + max_strength + max_action +
log_freq +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = filter(df, !is.na(non_iconic_gesture)),
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(other_mod_mdl, file = '../models/other_mod_mdl.RData')
Load and show model:
# Load model:
load('../models/other_mod_mdl.RData')
# Show model:
other_mod_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: non_iconic_gesture ~ 1 + type_cleaned + max_strength + max_action + log_freq + (1 | word) + (1 | url)
## Data: filter(df, !is.na(non_iconic_gesture)) (Number of observations: 1335)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 736)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.37 0.29 1.85 3.00 1.00 1693 3184
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.27 0.26 0.83 1.84 1.00 2219 4253
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -1.80 1.41 -4.56 1.06 1.00 3006 4337
## type_cleanediconic -0.79 0.64 -2.11 0.44 1.00 2641 3749
## max_strength 0.51 0.43 -0.32 1.35 1.00 2503 3795
## max_action 0.03 0.28 -0.54 0.58 1.00 2892 3504
## log_freq -0.06 0.27 -0.63 0.47 1.00 2207 3781
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Check how many of them are of the same sign:
hypothesis(other_mod_mdl, 'type_cleanediconic < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (type_cleanediconic) < 0 -0.79 0.64 -1.86 0.25 8.9
## Post.Prob Star
## 1 0.9
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_mod_mdl, 'max_strength > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (max_strength) > 0 0.51 0.43 -0.18 1.21 8.07 0.89
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_mod_mdl, 'max_action > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (max_action) > 0 0.03 0.28 -0.44 0.49 1.19 0.54
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Model with all other variables:
gesture_all_senses_mdl <- brm(gesture ~
# Fixed effects:
1 + auditory_strength +
visual_strength +
haptic_strength +
gustatory_strength +
olfactory_strength +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = df,
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(gesture_all_senses_mdl, file = '../models/gesture_all_senses_mdl.RData')
Load and show model:
# Load:
load('../models/gesture_all_senses_mdl.Rdata')
# Show:
gesture_all_senses_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: gesture ~ 1 + auditory_strength + visual_strength + haptic_strength + gustatory_strength + olfactory_strength + (1 | word) + (1 | url)
## Data: df (Number of observations: 1546)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 837)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.31 0.27 1.83 2.89 1.00 1318 2752
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.25 0.26 0.82 1.82 1.00 2101 4090
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -1.25 1.22 -3.60 1.13 1.00 2383 3960
## auditory_strength 0.19 0.24 -0.27 0.67 1.00 2530 3881
## visual_strength 0.56 0.36 -0.15 1.26 1.00 2455 4097
## haptic_strength 0.34 0.26 -0.18 0.87 1.00 2765 3799
## gustatory_strength -0.08 0.43 -0.95 0.77 1.00 2564 3537
## olfactory_strength -0.22 0.68 -1.55 1.13 1.00 3341 4139
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
This is to calculate the actual posterior probability of the effect being positive (= of the same sign), which is essentially just pinning a number to what proportion of the area in the distribution above is to the right of the dashed line.
hypothesis(gesture_all_senses_mdl, 'auditory_strength > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (auditory_strength) > 0 0.19 0.24 -0.19 0.59 3.59
## Post.Prob Star
## 1 0.78
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_all_senses_mdl, 'visual_strength > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (visual_strength) > 0 0.56 0.36 -0.03 1.14 16.02
## Post.Prob Star
## 1 0.94
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_all_senses_mdl, 'haptic_strength > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (haptic_strength) > 0 0.34 0.26 -0.08 0.77 9.77
## Post.Prob Star
## 1 0.91
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_all_senses_mdl, 'gustatory_strength < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (gustatory_strength) < 0 -0.08 0.43 -0.8 0.62 1.35
## Post.Prob Star
## 1 0.57
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_all_senses_mdl, 'olfactory_strength < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (olfactory_strength) < 0 -0.22 0.68 -1.33 0.9 1.75
## Post.Prob Star
## 1 0.64
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Same for iconic gestures:
iconic_all_senses_mdl <- brm(iconic_gesture ~
# Fixed effects:
1 + auditory_strength +
visual_strength +
haptic_strength +
gustatory_strength +
olfactory_strength +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = df,
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(iconic_all_senses_mdl, file = '../models/iconic_all_senses_mdl.RData')
Load and show model:
# Load model:
load('../models/iconic_all_senses_mdl.RData')
# Show model:
iconic_all_senses_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: iconic_gesture ~ 1 + auditory_strength + visual_strength + haptic_strength + gustatory_strength + olfactory_strength + (1 | word) + (1 | url)
## Data: df (Number of observations: 1545)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 837)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.35 0.35 1.71 3.10 1.00 1646 3235
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.72 0.36 1.11 2.54 1.00 2410 4322
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -5.72 1.75 -9.18 -2.33 1.00 2959 4408
## auditory_strength 0.55 0.32 -0.07 1.18 1.00 3319 4191
## visual_strength 0.08 0.49 -0.92 1.02 1.00 3088 4434
## haptic_strength 0.86 0.35 0.19 1.56 1.00 2817 4184
## gustatory_strength -0.48 0.56 -1.64 0.58 1.00 3755 4239
## olfactory_strength 0.06 0.90 -1.70 1.87 1.00 4540 5507
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Check how many of them are of the same sign:
hypothesis(iconic_all_senses_mdl, 'auditory_strength > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (auditory_strength) > 0 0.55 0.32 0.02 1.08 21.86
## Post.Prob Star
## 1 0.96 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_all_senses_mdl, 'visual_strength > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (visual_strength) > 0 0.08 0.49 -0.74 0.86 1.33
## Post.Prob Star
## 1 0.57
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_all_senses_mdl, 'haptic_strength > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (haptic_strength) > 0 0.86 0.35 0.29 1.45 132.33
## Post.Prob Star
## 1 0.99 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_all_senses_mdl, 'gustatory_strength < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (gustatory_strength) < 0 -0.48 0.56 -1.43 0.42 4.33
## Post.Prob Star
## 1 0.81
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_all_senses_mdl, 'olfactory_strength > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (olfactory_strength) > 0 0.06 0.9 -1.41 1.55 1.11
## Post.Prob Star
## 1 0.53
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
The final model is the one with only the non-iconic gestures:
Same for non-iconic (= “other”) gestures:
other_all_senses_mdl <- brm(non_iconic_gesture ~
# Fixed effects:
1 + auditory_strength +
visual_strength +
haptic_strength +
gustatory_strength +
olfactory_strength +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = filter(df, !is.na(non_iconic_gesture)),
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(other_all_senses_mdl, file = '../models/other_all_senses_mdl.RData')
Load and show model:
# Load model:
load('../models/other_all_senses_mdl.RData')
# Show model:
other_all_senses_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: non_iconic_gesture ~ 1 + auditory_strength + visual_strength + haptic_strength + gustatory_strength + olfactory_strength + (1 | word) + (1 | url)
## Data: filter(df, !is.na(non_iconic_gesture)) (Number of observations: 1335)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 736)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.40 0.29 1.87 3.02 1.00 1808 3154
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.09 0.26 0.63 1.65 1.00 1750 2606
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 1.18 1.14 -1.09 3.48 1.00 2829 4175
## auditory_strength 0.03 0.22 -0.42 0.47 1.00 3079 4285
## visual_strength -0.65 0.35 -1.35 0.03 1.00 2700 3874
## haptic_strength -0.01 0.25 -0.52 0.48 1.00 3209 4163
## gustatory_strength -0.21 0.43 -1.04 0.66 1.00 3092 4328
## olfactory_strength 0.45 0.66 -0.85 1.76 1.00 3306 4590
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Check how many of them are of the same sign:
hypothesis(other_all_senses_mdl, 'auditory_strength > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (auditory_strength) > 0 0.03 0.22 -0.34 0.39 1.24
## Post.Prob Star
## 1 0.55
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_all_senses_mdl, 'visual_strength < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (visual_strength) < 0 -0.65 0.35 -1.23 -0.1 33.48
## Post.Prob Star
## 1 0.97 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_all_senses_mdl, 'haptic_strength < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (haptic_strength) < 0 -0.01 0.25 -0.43 0.4 1.05
## Post.Prob Star
## 1 0.51
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_all_senses_mdl, 'gustatory_strength < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (gustatory_strength) < 0 -0.21 0.43 -0.91 0.51 2.26
## Post.Prob Star
## 1 0.69
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_all_senses_mdl, 'olfactory_strength > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (olfactory_strength) > 0 0.45 0.66 -0.63 1.54 3.08
## Post.Prob Star
## 1 0.76
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Model with all other variables:
gesture_all_motor_mdl <- brm(gesture ~
# Fixed effects:
1 + hand_arm_mean +
head_mean +
foot_leg_mean +
mouth_mean +
torso_mean +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = df,
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(gesture_all_motor_mdl, file = '../models/gesture_all_motor_mdl.RData')
Load and show model:
# Load:
load('../models/gesture_all_motor_mdl.Rdata')
# Show:
gesture_all_motor_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: gesture ~ 1 + hand_arm_mean + head_mean + foot_leg_mean + mouth_mean + torso_mean + (1 | word) + (1 | url)
## Data: df (Number of observations: 1546)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 837)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.31 0.27 1.82 2.86 1.00 1661 3496
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.13 0.25 0.71 1.67 1.00 1947 3368
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 1.89 1.18 -0.42 4.24 1.00 2397 3472
## hand_arm_mean 0.34 0.30 -0.24 0.97 1.00 2478 3786
## head_mean -0.26 0.34 -0.94 0.42 1.00 2519 3709
## foot_leg_mean 0.54 0.44 -0.34 1.39 1.00 2657 4013
## mouth_mean -0.35 0.21 -0.76 0.06 1.00 2536 3811
## torso_mean -0.86 0.46 -1.77 0.04 1.00 2900 3744
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
This is to calculate the actual posterior probability of the effect being positive (= of the same sign), which is essentially just pinning a number to what proportion of the area in the distribution above is to the right of the dashed line.
hypothesis(gesture_all_motor_mdl, 'hand_arm_mean > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (hand_arm_mean) > 0 0.34 0.3 -0.15 0.85 6.84 0.87
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_all_motor_mdl, 'head_mean < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (head_mean) < 0 -0.26 0.34 -0.83 0.3 3.79 0.79
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_all_motor_mdl, 'foot_leg_mean > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (foot_leg_mean) > 0 0.54 0.44 -0.18 1.26 8.21 0.89
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_all_motor_mdl, 'torso_mean < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (torso_mean) < 0 -0.86 0.46 -1.61 -0.12 31.39 0.97
## Star
## 1 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(gesture_all_motor_mdl, 'mouth_mean < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (mouth_mean) < 0 -0.35 0.21 -0.69 -0.02 22.88 0.96
## Star
## 1 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
Same for iconic gestures:
iconic_all_motor_mdl <- brm(iconic_gesture ~
# Fixed effects:
1 + hand_arm_mean +
head_mean +
foot_leg_mean +
mouth_mean +
torso_mean +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = df,
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(iconic_all_motor_mdl, file = '../models/iconic_all_motor_mdl.RData')
Load and show model:
# Load model:
load('../models/iconic_all_motor_mdl.RData')
# Show model:
iconic_all_motor_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: iconic_gesture ~ 1 + hand_arm_mean + head_mean + foot_leg_mean + mouth_mean + torso_mean + (1 | word) + (1 | url)
## Data: df (Number of observations: 1545)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 837)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.30 0.35 1.68 3.04 1.00 1600 3345
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.93 0.36 0.22 1.69 1.00 1011 902
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -1.92 1.20 -4.34 0.51 1.00 3817 4488
## hand_arm_mean 0.57 0.31 -0.06 1.18 1.00 3198 4340
## head_mean -0.55 0.36 -1.31 0.13 1.00 3965 4720
## foot_leg_mean 1.41 0.43 0.57 2.26 1.00 3629 4409
## mouth_mean -0.32 0.21 -0.74 0.10 1.00 3815 4733
## torso_mean -2.15 0.53 -3.23 -1.13 1.00 3179 4131
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Check how many of them are of the same sign:
hypothesis(iconic_all_motor_mdl, 'hand_arm_mean > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (hand_arm_mean) > 0 0.57 0.31 0.05 1.06 25.85 0.96
## Star
## 1 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_all_motor_mdl, 'head_mean < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (head_mean) < 0 -0.55 0.36 -1.18 0.03 16.2 0.94
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_all_motor_mdl, 'foot_leg_mean > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (foot_leg_mean) > 0 1.41 0.43 0.71 2.1 799 1
## Star
## 1 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_all_motor_mdl, 'mouth_mean < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (mouth_mean) < 0 -0.32 0.21 -0.66 0.02 16.09 0.94
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(iconic_all_motor_mdl, 'torso_mean < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (torso_mean) < 0 -2.15 0.53 -3.03 -1.29 Inf 1
## Star
## 1 *
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
The final model is the one with only the non-iconic gestures:
Same for non-iconic (= “other”) gestures:
other_all_motor_mdl <- brm(non_iconic_gesture ~
# Fixed effects:
1 + hand_arm_mean +
head_mean +
foot_leg_mean +
mouth_mean +
torso_mean +
# Random effects:
(1|word) + (1|url),
# More model specs:
prior = weak_priors,
family = bernoulli,
data = filter(df, !is.na(non_iconic_gesture)),
# MCMC estimation specs:
seed = 42, cores = 4,
iter = 4000, warmup = 2000)
# Save:
save(other_all_motor_mdl, file = '../models/other_all_motor_mdl.RData')
Load and show model:
# Load model:
load('../models/other_all_motor_mdl.RData')
# Show model:
other_all_motor_mdl
## Family: bernoulli
## Links: mu = logit
## Formula: non_iconic_gesture ~ 1 + hand_arm_mean + head_mean + foot_leg_mean + mouth_mean + torso_mean + (1 | word) + (1 | url)
## Data: filter(df, !is.na(non_iconic_gesture)) (Number of observations: 1335)
## Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
## total post-warmup draws = 8000
##
## Multilevel Hyperparameters:
## ~url (Number of levels: 736)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 2.42 0.30 1.89 3.04 1.00 1490 3121
##
## ~word (Number of levels: 45)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 1.20 0.27 0.74 1.78 1.00 2013 3150
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -1.07 1.24 -3.54 1.36 1.00 2261 3505
## hand_arm_mean -0.22 0.32 -0.87 0.41 1.00 2582 3706
## head_mean 0.09 0.36 -0.61 0.80 1.00 2417 3939
## foot_leg_mean 0.07 0.49 -0.89 1.04 1.00 2761 4059
## mouth_mean 0.32 0.22 -0.12 0.76 1.00 2108 2863
## torso_mean 0.18 0.50 -0.79 1.19 1.00 2870 3893
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Check how many of them are of the same sign:
hypothesis(other_all_motor_mdl, 'hand_arm_mean < 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (hand_arm_mean) < 0 -0.22 0.32 -0.75 0.3 3.06 0.75
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_all_motor_mdl, 'head_mean > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (head_mean) > 0 0.09 0.36 -0.5 0.68 1.53 0.6
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_all_motor_mdl, 'foot_leg_mean > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (foot_leg_mean) > 0 0.07 0.49 -0.73 0.89 1.24 0.55
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_all_motor_mdl, 'mouth_mean > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (mouth_mean) > 0 0.32 0.22 -0.04 0.69 12.49 0.93
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
hypothesis(other_all_motor_mdl, 'torso_mean > 0')
## Hypothesis Tests for class b:
## Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob
## 1 (torso_mean) > 0 0.18 0.5 -0.63 1.02 1.81 0.64
## Star
## 1
## ---
## 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
## '*': For one-sided hypotheses, the posterior probability exceeds 95%;
## for two-sided hypotheses, the value tested against lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.
First, let’s see how for all gestures, the iconicity model competes against the all sensory strength and all motor strength model:
bayes_R2(gesture_mdl)
## Estimate Est.Error Q2.5 Q97.5
## R2 0.4848634 0.02663119 0.4309332 0.5354387
bayes_R2(gesture_all_senses_mdl)
## Estimate Est.Error Q2.5 Q97.5
## R2 0.4915309 0.02622017 0.4374499 0.5409381
bayes_R2(gesture_all_motor_mdl)
## Estimate Est.Error Q2.5 Q97.5
## R2 0.4914347 0.02585352 0.4388202 0.5403105
Next, same thing for iconic gestures only:
bayes_R2(iconic_mdl)
## Estimate Est.Error Q2.5 Q97.5
## R2 0.4547021 0.03812971 0.3773427 0.5251493
bayes_R2(iconic_all_senses_mdl)
## Estimate Est.Error Q2.5 Q97.5
## R2 0.4665948 0.03773457 0.3899429 0.5385222
bayes_R2(iconic_all_motor_mdl)
## Estimate Est.Error Q2.5 Q97.5
## R2 0.4587925 0.03801456 0.3837259 0.5309054
Then for all other gestures:
bayes_R2(other_mdl)
## Estimate Est.Error Q2.5 Q97.5
## R2 0.4875807 0.02941444 0.4278217 0.5429119
bayes_R2(other_all_senses_mdl)
## Estimate Est.Error Q2.5 Q97.5
## R2 0.4951544 0.02871305 0.4369144 0.5490764
bayes_R2(other_all_motor_mdl)
## Estimate Est.Error Q2.5 Q97.5
## R2 0.4973807 0.02874939 0.4390567 0.5502481
Now we have coded a subset of the data for various morphosyntactic markers. We want to knwow where each word type (high gesture, low gesture, moderate gesture) fits into each category (POS, morphology, quotative verb, position in clause, immediate repetition).
First, let’s load the csv.
df <- read_csv('../data/morphosyntactic_annotation_v2.csv')
Let’s check the column names.
colnames(df)
## [1] "id" "no"
## [3] "SUBTLEX POS" "type"
## [5] "word" "url"
## [7] "word_used" "comment"
## [9] "not_duplicate" "speaker_visible"
## [11] "hands_visible" "hands_free"
## [13] "syntactically integrated" "transcript"
## [15] "POS" "word in use"
## [17] "morphology" "quotative_verb"
## [19] "left_edge" "right_edge"
## [21] "repetition" "gesture"
## [23] "iconic_gesture" "gesture_description"
## [25] "beat" "deictic"
## [27] "metaphor" "pantomime"
## [29] "modelling" "depiction"
## [31] "context" "unscripted interview"
## [33] "presenting camera" "presenting screen"
## [35] "giving speech" "speaking in court"
## [37] "semi-scripted" "scripted acting"
There’s a lot! We won’t use them all. But we need to ensure the columns we are going to need are factors or characters. The columns are: word, type, POS, morphology, quotative_verb, and repetition. We also want to do position in clause, but we don’t have a direct column for that yet, so we’ll deal with that later.
df$word <- as.factor(df$word)
df$type <- as.factor(df$type)
df$POS <- as.factor(df$POS)
df$morphology <- factor(df$morphology, levels = c("Y", "N"))
df$quotative_verb <- factor(df$quotative_verb, levels = c("Y", "N"))
df$repetition <- factor(df$repetition, levels = c("Y", "N"))
Now let’s calculate the % POS for each word.
n_df <- df %>%
count(word, type, POS, name = "n")
n_df_complete <- n_df %>%
group_by(word, type) %>%
complete(POS = levels(df$POS), fill = list(n = 0)) %>%
ungroup()
pct_df <- n_df_complete %>%
group_by(word, type) %>%
mutate(
total = sum(n),
pct = if_else(total > 0, n / total * 100, 0)
) %>%
ungroup()
And now we’ll average POS % across words, grouped by gesture type (high gesture, moderate gesture, low gesture)
type_pos_avg <- pct_df %>%
group_by(type, POS) %>%
summarise(avg_pct = mean(pct), .groups = "drop") %>%
pivot_wider(
names_from = POS,
values_from = avg_pct,
values_fill = 0
)
Let’s have a look at it.
type_pos_avg %>%
mutate_if(is.numeric, ~ round(.x, 1))
## # A tibble: 3 × 5
## type adjective interjection noun verb
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 high gesture 40 27.1 0 32.9
## 2 low gesture 49.7 0 12.1 38.2
## 3 moderate gesture 7.7 33.5 7.7 51.1
Great. Now let’s do the same for the morphology category. First, calculate the % Y/N for each word.
morph_counts <- df %>%
count(word, type, morphology, name = "n")
morph_complete <- morph_counts %>%
group_by(word, type) %>%
complete(morphology = c("Y", "N"), fill = list(n = 0)) %>%
ungroup()
morph_pct <- morph_complete %>%
group_by(word, type) %>%
mutate(
total = sum(n),
pct = if_else(total > 0, n / total * 100, 0)
) %>%
ungroup()
And calculate the average % by word type (high, low, moderate gesture).
type_morph_avg <- morph_pct %>%
group_by(type, morphology) %>%
summarise(avg_pct = mean(pct), .groups = "drop") %>%
pivot_wider(
names_from = morphology,
values_from = avg_pct,
values_fill = 0
)
Let’s take a look at it.
type_morph_avg %>%
mutate_if(is.numeric, ~ round(.x, 1))
## # A tibble: 3 × 3
## type N Y
## <fct> <dbl> <dbl>
## 1 high gesture 30 70
## 2 low gesture 24.8 75.2
## 3 moderate gesture 56.8 43.2
Nice. Now let’s do the same for the quotative_verb
category. First, calculate the % Y/N for each word.
quot_counts <- df %>%
count(word, type, quotative_verb, name = "n")
quot_complete <- quot_counts %>%
group_by(word, type) %>%
complete(quotative_verb = c("Y", "N"), fill = list(n = 0)) %>%
ungroup()
quot_pct <- quot_complete %>%
group_by(word, type) %>%
mutate(
total = sum(n),
pct = if_else(total > 0, n / total * 100, 0)
) %>%
ungroup()
And calculate the average % by word type (high, low, moderate gesture).
type_quot_avg <- quot_pct %>%
group_by(type, quotative_verb) %>%
summarise(avg_pct = mean(pct), .groups = "drop") %>%
pivot_wider(
names_from = quotative_verb,
values_from = avg_pct,
values_fill = 0
)
Let’s take a look at it.
type_quot_avg %>%
mutate_if(is.numeric, ~ round(.x, 1))
## # A tibble: 3 × 3
## type N Y
## <fct> <dbl> <dbl>
## 1 high gesture 92.5 7.5
## 2 low gesture 100 0
## 3 moderate gesture 84.4 15.6
Looks good. Now let’s do the same for the repetition category. First, calculate the % Y/N for each word.
rep_counts <- df %>%
count(word, type, repetition, name = "n")
rep_complete <- rep_counts %>%
group_by(word, type) %>%
complete(repetition = c("Y", "N"), fill = list(n = 0)) %>%
ungroup()
rep_pct <- rep_complete %>%
group_by(word, type) %>%
mutate(
total = sum(n),
pct = if_else(total > 0, n / total * 100, 0)
) %>%
ungroup()
And calculate the average % by word type (high, low, moderate gesture).
type_rep_avg <- rep_pct %>%
group_by(type, repetition) %>%
summarise(avg_pct = mean(pct), .groups = "drop") %>%
pivot_wider(
names_from = repetition,
values_from = avg_pct,
values_fill = 0
)
Let’s take a look at it.
type_rep_avg %>%
mutate_if(is.numeric, ~ round(.x, 1))
## # A tibble: 3 × 3
## type N Y
## <fct> <dbl> <dbl>
## 1 high gesture 75.7 24.3
## 2 low gesture 100 0
## 3 moderate gesture 71.6 28.4
Great stuff. Now let’s deal with the position in clause category. We have two Y/N columns with this info: left_edge and right_edge. We want to use this to group the items into four categories: embedded (left_edge = N, right_edge = N), left edge only (left_edge = Y, right_edge = N), right edge only (left_edge = N, right_edge = Y), and isolated (left_edge = Y, right_edge = Y). Let’s do that!
df <- df %>%
mutate(
word = factor(word),
type = factor(type),
left_edge = factor(left_edge, levels = c("Y","N")),
right_edge = factor(right_edge, levels = c("Y","N")),
position = case_when(
left_edge == "N" & right_edge == "N" ~ "embedded",
left_edge == "Y" & right_edge == "N" ~ "left_edge_only",
left_edge == "N" & right_edge == "Y" ~ "right_edge_only",
left_edge == "Y" & right_edge == "Y" ~ "isolated"
) %>% factor(levels = c("embedded","left_edge_only","right_edge_only","isolated"))
)
Now let’s calculate the % per word in each position category.
position_counts <- df %>%
count(word, type, position, name = "n")
position_complete <- position_counts %>%
group_by(word, type) %>%
complete(position, fill = list(n = 0)) %>%
ungroup()
position_pct <- position_complete %>%
group_by(word, type) %>%
mutate(
total = sum(n),
pct = if_else(total > 0, n / total * 100, 0)
) %>%
ungroup()
And calculate the average across words, according to type (high, low, moderate gesture).
type_position_avg <- position_pct %>%
group_by(type, position) %>%
summarise(avg_pct = mean(pct), .groups = "drop") %>%
pivot_wider(
names_from = position,
values_from = avg_pct,
values_fill = 0
)
Let’s take a look.
type_position_avg %>%
mutate_if(is.numeric, ~ round(.x, 1))
## # A tibble: 3 × 5
## type embedded left_edge_only right_edge_only isolated
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 high gesture 44.8 25.2 9.5 20.5
## 2 low gesture 77.3 10.6 12.1 0
## 3 moderate gesture 48.1 5.3 15.6 31
Cool. Let’s merge these into one big table.
First, we need to disambiguate the identical Y/N column names by giving them category prefixes.
type_morph_avg <- type_morph_avg %>% rename(morph_Y = Y, morph_N = N)
type_quot_avg <- type_quot_avg %>% rename(quot_Y = Y, quot_N = N)
type_rep_avg<- type_rep_avg%>% rename(repeat_Y= Y, repeat_N= N)
Now let’s gather all per‐type tables into a list.
tables <- list(
type_pos_avg,
type_morph_avg,
type_quot_avg,
type_position_avg,
type_rep_avg
)
Now let’s left‐join them one after the other on the “type” column.
merged_summary <- tables %>%
reduce(full_join, by = "type") %>%
mutate(across(
where(is.numeric),
~ round(.x, 0) # 0 decimal places
))
And let’s have a look.
print(merged_summary)
## # A tibble: 3 × 15
## type adjective interjection noun verb morph_N morph_Y quot_N quot_Y
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 high gesture 40 27 0 33 30 70 93 7
## 2 low gesture 50 0 12 38 25 75 100 0
## 3 moderate ges… 8 34 8 51 57 43 84 16
## # ℹ 6 more variables: embedded <dbl>, left_edge_only <dbl>,
## # right_edge_only <dbl>, isolated <dbl>, repeat_N <dbl>, repeat_Y <dbl>
Looks good but it hard to read. Let’s transpose it so the types are columns and the categories are rows.
# Rename POS and position columns so we have the category in there
renamed_summary <- merged_summary %>%
rename_with(~ paste0("POS_", .), .cols = c("adjective", "noun", "verb", "interjection")) %>%
rename_with(~ paste0("position_", .), .cols = c("embedded", "left_edge_only", "right_edge_only", "isolated"))
# Pivot into long form
readable_summary <- renamed_summary %>%
pivot_longer(
cols = -type,
names_to = "metric",
values_to = "percentage"
) %>%
extract(
col = metric,
into = c("category", "level"),
regex = "^([^_]+)_(.*)$"
) %>%
pivot_wider(
names_from = type,
values_from = percentage
)
print(readable_summary)
## # A tibble: 14 × 5
## category level `high gesture` `low gesture` `moderate gesture`
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 POS adjective 40 50 8
## 2 POS interjection 27 0 34
## 3 POS noun 0 12 8
## 4 POS verb 33 38 51
## 5 morph N 30 25 57
## 6 morph Y 70 75 43
## 7 quot N 93 100 84
## 8 quot Y 7 0 16
## 9 position embedded 45 77 48
## 10 position left_edge_only 25 11 5
## 11 position right_edge_only 10 12 16
## 12 position isolated 20 0 31
## 13 repeat N 76 100 72
## 14 repeat Y 24 0 28
And save it outside of R.
write.csv(
readable_summary,
file = '../Data/morphosyntactic_category.csv',
row.names = FALSE
)
Now we want to calculate the iconic gesture rate for each word according to each of the categories we’ve been working with. The low gesture words don’t have any iconic gestures, so we’re just dealing with the moderate and high gesture words.
First, let’s filter to these words.
df <- df %>%
# Turn everything into a character so we catch blanks/factors
mutate(iconic_raw = as.character(iconic_gesture)) %>%
# Recode: “1” → 1, everything else (including "" and NA) → 0
mutate(iconic_gesture = if_else(iconic_raw == "1", 1L, 0L)) %>%
select(-iconic_raw)
# Keep only high & moderate gesture words (since low have no iconic gestures)
df2 <- df %>%
filter(type %in% c("high gesture", "moderate gesture"))
Now let’s calculate the iconic gesture rates for each word by
category. First, define a function called
calc_word_iconic.
calc_word_iconic <- function(data, grp) {
grp_q <- enquo(grp)
data %>%
group_by(type, word, !!grp_q) %>%
summarise(
total_items = n(),
iconic_items = sum(iconic_gesture, na.rm = TRUE),
iconic_rate = if_else(total_items > 0,
100 * iconic_items / total_items,
0),
.groups = "drop"
)
}
And now let’s use the function to calculate the iconic gesture rate by word, by POS.
word_pos_iconic <- calc_word_iconic(df2, POS)
And let’s take a look:
print(word_pos_iconic)
## # A tibble: 16 × 6
## type word POS total_items iconic_items iconic_rate
## <fct> <fct> <fct> <int> <int> <dbl>
## 1 high gesture bang interjection 3 2 66.7
## 2 high gesture bang verb 3 3 100
## 3 high gesture squish adjective 14 12 85.7
## 4 high gesture squish interjection 4 4 100
## 5 high gesture squish verb 17 12 70.6
## 6 high gesture wobbly adjective 4 2 50
## 7 high gesture wobbly interjection 1 1 100
## 8 moderate gesture chomp interjection 1 1 100
## 9 moderate gesture chomp verb 11 2 18.2
## 10 moderate gesture woof interjection 10 5 50
## 11 moderate gesture woof noun 2 0 0
## 12 moderate gesture woof verb 1 0 0
## 13 moderate gesture zap adjective 3 0 0
## 14 moderate gesture zap interjection 2 1 50
## 15 moderate gesture zap noun 1 0 0
## 16 moderate gesture zap verb 7 2 28.6
Looks good. Let’s get the average across words according to their type (high or moderate gesture).
type_pos_iconic <- word_pos_iconic %>%
group_by(type, POS) %>%
summarise(
avg_iconic_rate = mean(iconic_rate),
.groups = "drop"
) %>%
pivot_wider(
names_from = POS,
values_from = avg_iconic_rate
) %>%
mutate(
across(where(is.numeric), ~ round(.x, 0))
)
How does it look?
print(type_pos_iconic)
## # A tibble: 2 × 5
## type adjective interjection verb noun
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 high gesture 68 89 85 NA
## 2 moderate gesture 0 67 16 0
Great, now we have iconic gesture rates for high and moderate gesture words by POS. We want to do the same for each other morphosyntactic category. Next, morphology.
word_morph_iconic <- calc_word_iconic(df2, morphology)
And let’s take a look:
print(word_morph_iconic)
## # A tibble: 10 × 6
## type word morphology total_items iconic_items iconic_rate
## <fct> <fct> <fct> <int> <int> <dbl>
## 1 high gesture bang Y 3 3 100
## 2 high gesture bang N 3 2 66.7
## 3 high gesture squish Y 21 17 81.0
## 4 high gesture squish N 14 11 78.6
## 5 high gesture wobbly Y 5 3 60
## 6 moderate gesture chomp Y 10 2 20
## 7 moderate gesture chomp N 2 1 50
## 8 moderate gesture woof N 13 5 38.5
## 9 moderate gesture zap Y 6 2 33.3
## 10 moderate gesture zap N 7 1 14.3
Looks good. Let’s get the average across words according to their type (high or moderate gesture).
type_morph_iconic <- word_morph_iconic %>%
group_by(type, morphology) %>%
summarise(
avg_iconic_rate = mean(iconic_rate),
.groups = "drop"
) %>%
pivot_wider(
names_from = morphology,
values_from = avg_iconic_rate
) %>%
mutate(
across(where(is.numeric), ~ round(.x, 0))
)
And let’s look.
print(type_morph_iconic)
## # A tibble: 2 × 3
## type Y N
## <fct> <dbl> <dbl>
## 1 high gesture 80 73
## 2 moderate gesture 27 34
Looks good. Same for quotative verb.
word_quot_iconic <- calc_word_iconic(df2, quotative_verb)
And let’s take a look:
print(word_quot_iconic)
## # A tibble: 11 × 6
## type word quotative_verb total_items iconic_items iconic_rate
## <fct> <fct> <fct> <int> <int> <dbl>
## 1 high gesture bang Y 1 0 0
## 2 high gesture bang N 5 5 100
## 3 high gesture squish Y 2 2 100
## 4 high gesture squish N 33 26 78.8
## 5 high gesture wobbly N 5 3 60
## 6 moderate gesture chomp Y 1 1 100
## 7 moderate gesture chomp N 11 2 18.2
## 8 moderate gesture woof Y 4 2 50
## 9 moderate gesture woof N 9 3 33.3
## 10 moderate gesture zap Y 1 0 0
## 11 moderate gesture zap N 12 3 25
Looks good. Let’s get the average across words according to their type (high or moderate gesture).
type_quot_iconic <- word_quot_iconic %>%
group_by(type, quotative_verb) %>%
summarise(
avg_iconic_rate = mean(iconic_rate),
.groups = "drop"
) %>%
pivot_wider(
names_from = quotative_verb,
values_from = avg_iconic_rate
) %>%
mutate(
across(where(is.numeric), ~ round(.x, 0))
)
And look.
print(type_quot_iconic)
## # A tibble: 2 × 3
## type Y N
## <fct> <dbl> <dbl>
## 1 high gesture 50 80
## 2 moderate gesture 50 26
Great. Now position in clause.
word_position_iconic <- calc_word_iconic(df2, position)
And let’s take a look:
print(word_position_iconic)
## # A tibble: 20 × 6
## type word position total_items iconic_items iconic_rate
## <fct> <fct> <fct> <int> <int> <dbl>
## 1 high gesture bang left_edge_only 3 3 100
## 2 high gesture bang isolated 3 2 66.7
## 3 high gesture squish embedded 26 21 80.8
## 4 high gesture squish left_edge_only 2 1 50
## 5 high gesture squish right_edge_only 3 2 66.7
## 6 high gesture squish isolated 4 4 100
## 7 high gesture wobbly embedded 3 1 33.3
## 8 high gesture wobbly left_edge_only 1 1 100
## 9 high gesture wobbly right_edge_only 1 1 100
## 10 moderate gesture chomp embedded 9 2 22.2
## 11 moderate gesture chomp left_edge_only 1 0 0
## 12 moderate gesture chomp right_edge_only 1 0 0
## 13 moderate gesture chomp isolated 1 1 100
## 14 moderate gesture woof embedded 2 0 0
## 15 moderate gesture woof left_edge_only 1 0 0
## 16 moderate gesture woof right_edge_only 1 0 0
## 17 moderate gesture woof isolated 9 5 55.6
## 18 moderate gesture zap embedded 7 2 28.6
## 19 moderate gesture zap right_edge_only 4 0 0
## 20 moderate gesture zap isolated 2 1 50
Looks good. Let’s get the average across words according to their type (high or moderate gesture).
type_position_iconic <- word_position_iconic %>%
group_by(type, position) %>%
summarise(
avg_iconic_rate = mean(iconic_rate),
.groups = "drop"
) %>%
pivot_wider(
names_from = position,
values_from = avg_iconic_rate
) %>%
mutate(
across(where(is.numeric), ~ round(.x, 0))
)
And let’s look.
print(type_position_iconic)
## # A tibble: 2 × 5
## type embedded left_edge_only right_edge_only isolated
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 high gesture 57 83 83 83
## 2 moderate gesture 17 0 0 69
Looks good. Finally, let’s do the same for repetition.
word_rep_iconic <- calc_word_iconic(df2, repetition)
And let’s check:
print(word_rep_iconic)
## # A tibble: 12 × 6
## type word repetition total_items iconic_items iconic_rate
## <fct> <fct> <fct> <int> <int> <dbl>
## 1 high gesture bang Y 3 2 66.7
## 2 high gesture bang N 3 3 100
## 3 high gesture squish Y 1 1 100
## 4 high gesture squish N 34 27 79.4
## 5 high gesture wobbly Y 1 1 100
## 6 high gesture wobbly N 4 2 50
## 7 moderate gesture chomp Y 1 1 100
## 8 moderate gesture chomp N 11 2 18.2
## 9 moderate gesture woof Y 8 4 50
## 10 moderate gesture woof N 5 1 20
## 11 moderate gesture zap Y 2 1 50
## 12 moderate gesture zap N 11 2 18.2
Wonderful. Let’s get the average across words according to their type (high or moderate gesture).
type_rep_iconic <- word_rep_iconic %>%
group_by(type, repetition) %>%
summarise(
avg_iconic_rate = mean(iconic_rate),
.groups = "drop"
) %>%
pivot_wider(
names_from = repetition,
values_from = avg_iconic_rate
) %>%
mutate(
across(where(is.numeric), ~ round(.x, 0))
)
And check it.
print(type_rep_iconic)
## # A tibble: 2 × 3
## type Y N
## <fct> <dbl> <dbl>
## 1 high gesture 89 76
## 2 moderate gesture 67 19
Perfect. Now let’s merge them into one table, as before. First, we want to tweak the names in each table so the Y/N columns get a clear prefix.
pos_tbl <- type_pos_iconic %>%
rename_with(~ paste0("iconic_POS_", .), -type)
morph_tbl <- type_morph_iconic %>%
rename_with(~ paste0("iconic_morph_", .), -type)
quot_tbl <- type_quot_iconic %>%
rename_with(~ paste0("iconic_quot_", .), -type)
posn_tbl <- type_position_iconic %>%
rename_with(~ paste0("iconic_position_", .), -type)
rept_tbl <- type_rep_iconic %>%
rename_with(~ paste0("iconic_repetition_", .), -type)
Now put them into a list and full-join by type.
merged_iconic_rates <- list(
pos_tbl,
morph_tbl,
quot_tbl,
posn_tbl,
rept_tbl
) %>%
reduce(full_join, by = "type")
And let’s check it.
print(merged_iconic_rates)
## # A tibble: 2 × 15
## type iconic_POS_adjective iconic_POS_interjection iconic_POS_verb
## <fct> <dbl> <dbl> <dbl>
## 1 high gesture 68 89 85
## 2 moderate gesture 0 67 16
## # ℹ 11 more variables: iconic_POS_noun <dbl>, iconic_morph_Y <dbl>,
## # iconic_morph_N <dbl>, iconic_quot_Y <dbl>, iconic_quot_N <dbl>,
## # iconic_position_embedded <dbl>, iconic_position_left_edge_only <dbl>,
## # iconic_position_right_edge_only <dbl>, iconic_position_isolated <dbl>,
## # iconic_repetition_Y <dbl>, iconic_repetition_N <dbl>
Looks good but it’s hard to read, so let’s transpose it to have the type as columns and the categories as rows.
readable_iconic <- merged_iconic_rates %>%
# Gather all the metric‐columns into key/value pairs
pivot_longer(
cols = -type,
names_to = "metric",
values_to = "rate"
) %>%
# Strip the "iconic_" prefix, then split into category & level
mutate(
metric = sub("^iconic_", "", metric)
) %>%
separate(
metric,
into = c("category", "level"),
sep = "_",
extra = "merge"
) %>%
# Spread the two types into their own columns
pivot_wider(
names_from = type,
values_from = rate
)
print(readable_iconic)
## # A tibble: 14 × 4
## category level `high gesture` `moderate gesture`
## <chr> <chr> <dbl> <dbl>
## 1 POS adjective 68 0
## 2 POS interjection 89 67
## 3 POS verb 85 16
## 4 POS noun NA 0
## 5 morph Y 80 27
## 6 morph N 73 34
## 7 quot Y 50 50
## 8 quot N 80 26
## 9 position embedded 57 17
## 10 position left_edge_only 83 0
## 11 position right_edge_only 83 0
## 12 position isolated 83 69
## 13 repetition Y 89 67
## 14 repetition N 76 19
And save it outside of R.
write.csv(
readable_iconic,
file = '../Data/iconic_gesture_rates_by_morphosyntactic_category.csv',
row.names = FALSE
)
This completes the analysis.